Article Preview
Top1 Introduction
The tasks of resource classification and retrieval from knowledge bases (KBs) in the Semantic Web (SW) are the basis for many important knowledge-intensive applications. However the inherent incompleteness and accidental inconsistency of knowledge bases in the Semantic Web requires new different methods which are able to perform such tasks efficiently and effectively (although with some acceptable approximation). Instance-related tasks are generally tackled by means of logical approaches that try to cope with the problems mentioned above. This has given rise to alternative methods for approximate reasoning (Wache, Groot & Stuckenschmidt, 2005), (Hitzler & Vrandecic, 2005), (Haase, van Harmelen, Huang, Stuckenschmidt& Halberstadt, 2005), (Möller, Haarslev & Wessel, 2006), (Huang & van Harmelen, 2008), (Tserendorj, Rudolph, Krötzsch & Hitzler, 2008), (Rudolph, Tserendorj & Hitzler, 2008). Inductive methods for approximate reasoning are known to be often quite efficient, scalable, and noise-tolerant.
Recently, first steps have been taken to apply classic machine learning techniques for building inductive classifiers for the complex representations, and related semantics, adopted in the context of the SW (Fanizzi, d'Amato & Esposito, 2008a), especially through non-parametric1 statistical methods (d'Amato, Fanizzi & Esposito, 2008), (Fanizzi, d'Amato & Esposito, 2008d). Instance-based inductive methods may help a knowledge engineer populate ontologies (Baader, Ganter, Sertkaya & Sattle, 2007). Some methods are also able to complete ontologies with probabilistic assertions derived exploiting the missing and sparse data in the ontologies (Rettinger, Nickles & Tresp, 2009). Further sophisticate approaches are able of dealing with uncertainty encoded in probabilistic ontologies through suitable forms of reasoning (Lukasiewicz, 2008).
In this paper we propose a novel method for inducing classifiers from ontological data that may naturally be employed as an alternative way for performing concept retrieval (Baader, Calvanese, McGuinness, Nardi, Patel-Schneider, 2003) and several other related applications. Even more so, like its predecessors mentioned above, the induced classifier is also able to determine a likelihood measure of the induced class-membership assertions which is important for approximate query answering and ranking. Some assertions could not be logically derived, but may be highly probable according to the inductive classifier; this may help to cope with the uncertainty caused by the inherent incompleteness of the KBs even in absence of an explicit probabilistic model.
Specifically, we propose to answer queries adopting an instance-based classifier, the Reduced Coulomb Energy (RCE) network (Duda, Hart & Stork, 2001), induced by a non-parametric learning method. The essentials of this learning scheme have been extended to be applied to the standard representations of the SW via semantic similarity measures for individual resources. As with other similarity-based methods, a retrieval procedure may seek for individuals belonging to query concepts, exploiting the analogy with other training instances, namely the classification of the nearest ones (w.r.t. the measure of choice). Differently from other lazy-learning approaches experimented in the past (d'Amato, Fanizzi & Esposito, 2008) which do not require training, yet more similarly to the non-parametric methods based on kernel machines (Bloehdorn & Sure, 2007), (Fanizzi, d'Amato & Esposito, 2008d), the new method is organized in two phases: