Language-Independent Type Inference of the Instances from Multilingual Wikipedia

Language-Independent Type Inference of the Instances from Multilingual Wikipedia

Tianxing Wu (School of Computer Science and Engineering, Southeast University, Nanjing, China), Guilin Qi (School of Computer Science and Engineering, Southeast University, Nanjing, China), Bin Luo (School of Computer Science and Engineering, Southeast University, Nanjing, China), Lei Zhang (Institute AIFB, Karlsruhe Institute of Technology, Karlsruhe, Germany) and Haofen Wang (Gowild Inc., Shenzhen, China)
Copyright: © 2019 |Pages: 25
DOI: 10.4018/IJSWIS.2019040102

Abstract

Extracting knowledge from Wikipedia has attracted much attention in recent ten years. One of the most valuable kinds of knowledge is type information, which refers to the axioms stating that an instance is of a certain type. Current approaches for inferring the types of instances from Wikipedia mainly rely on some language-specific rules. Since these rules cannot catch the semantic associations between instances and classes (i.e. candidate types), it may lead to mistakes and omissions in the process of type inference. The authors propose a new approach leveraging attributes to perform language-independent type inference of the instances from Wikipedia. The proposed approach is applied to the whole English and Chinese Wikipedia, which results in the first version of MulType (Multilingual Type Information), a knowledge base describing the types of instances from multilingual Wikipedia. Experimental results show that not only the proposed approach outperforms the state-of-the-art comparison methods, but also MulType contains lots of new and high-quality type information.
Article Preview

1. Introduction

Extracting knowledge from Wikipedia has attracted much attention in recent ten years. So far, different kinds of knowledge have been extracted and published as linked open dataset. One of the most valuable kinds of knowledge is type information, which refers to the axioms stating that an instance is of a certain type. These axioms can be denoted as triples, each of which is composed of a TypeOf relation linking from a class to an instance, e.g. “President of the United StatesTypeOfBarack Obama” and “Country in EuropeTypeOfItaly”. Type information can benefit many applications in different research fields, such as entity search (Ding et al., 2015; Tonon et al., 2013), question answering (Kalyanpur et al., 2011) and product recommendation (Hepp, 2008).

In Wikipedia, we treat a category and an article respectively as a class1 and an instance. Thus, it seems that we can obtain the type information directly by transforming the user-generated subsumption relation from a category to an article to the TypeOf relation from a class to an instance. However, this user-generated subsumption relation essentially represents a TopicOf relation, which only means that the category is seen as a topic of the article, e.g. category “Obama FamilyTopicOf article “Barack Obama”. Such TopicOf relations are obviously different from the TypeOf relations defined in knowledge bases (KBs).

In order to overcome this problem, some works (De Melo & Weikum, 2010; Suchanek, Kasneci, & Weikum, 2008) try to discover the correct TypeOf relations among the user-generated TopicOf relations. The heuristic rules applied in the above works are similar. The central idea is that “the TopicOf relation from a category to an article can be transformed to the TypeOf relation from a class to an instance with high accuracy, when the headword noun of the given category is plural or countable” Though these heuristic rules have been proved effective, they still suffer from the following problems:

  • These rules are language dependent, thus cannot be used in some languages (e.g. Chinese and Japanese), in which nouns have no explicit singular or plural forms.

  • These rules cannot catch the semantic associations between instances and classes (i.e. candidate types), which may lead to mistakes and omissions in the process of type inference. For example, some classes with plural headword nouns may be incorrect types of instances (see Figure 1). Besides, all the classes with singular headword nouns are ignored, but many of them are correct types (see Figure 2).

Figure 1.

Some categories (i.e. classes) with plural headword nouns of the article (i.e. instance) “Australian cricket team in England in 1948”

Figure 2.

Some categories (i.e. classes) with singular headword nouns of the article (i.e. instance) “MySQL”

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 15: 4 Issues (2019): 3 Released, 1 Forthcoming
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing