Article Preview
Top1. Introduction
Extracting knowledge from Wikipedia has attracted much attention in recent ten years. So far, different kinds of knowledge have been extracted and published as linked open dataset. One of the most valuable kinds of knowledge is type information, which refers to the axioms stating that an instance is of a certain type. These axioms can be denoted as triples, each of which is composed of a TypeOf relation linking from a class to an instance, e.g. “President of the United States” TypeOf “Barack Obama” and “Country in Europe” TypeOf “Italy”. Type information can benefit many applications in different research fields, such as entity search (Ding et al., 2015; Tonon et al., 2013), question answering (Kalyanpur et al., 2011) and product recommendation (Hepp, 2008).
In Wikipedia, we treat a category and an article respectively as a class1 and an instance. Thus, it seems that we can obtain the type information directly by transforming the user-generated subsumption relation from a category to an article to the TypeOf relation from a class to an instance. However, this user-generated subsumption relation essentially represents a TopicOf relation, which only means that the category is seen as a topic of the article, e.g. category “Obama Family” TopicOf article “Barack Obama”. Such TopicOf relations are obviously different from the TypeOf relations defined in knowledge bases (KBs).
In order to overcome this problem, some works (De Melo & Weikum, 2010; Suchanek, Kasneci, & Weikum, 2008) try to discover the correct TypeOf relations among the user-generated TopicOf relations. The heuristic rules applied in the above works are similar. The central idea is that “the TopicOf relation from a category to an article can be transformed to the TypeOf relation from a class to an instance with high accuracy, when the headword noun of the given category is plural or countable” Though these heuristic rules have been proved effective, they still suffer from the following problems:
- •
These rules are language dependent, thus cannot be used in some languages (e.g. Chinese and Japanese), in which nouns have no explicit singular or plural forms.
- •
These rules cannot catch the semantic associations between instances and classes (i.e. candidate types), which may lead to mistakes and omissions in the process of type inference. For example, some classes with plural headword nouns may be incorrect types of instances (see Figure 1). Besides, all the classes with singular headword nouns are ignored, but many of them are correct types (see Figure 2).
Figure 1. Some categories (i.e. classes) with plural headword nouns of the article (i.e. instance) “Australian cricket team in England in 1948”
Figure 2. Some categories (i.e. classes) with singular headword nouns of the article (i.e. instance) “MySQL”