Bayes-ReCCE: A Bayesian Model for Detecting Restriction Class Correspondences in Linked Open Data Knowledge Bases

Bayes-ReCCE: A Bayesian Model for Detecting Restriction Class Correspondences in Linked Open Data Knowledge Bases

Brian Walshe (ADAPT Centre for Digital Content Technology, Knowledge and Data Engineering Group, School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland), Rob Brennan (ADAPT Centre for Digital Content Technology, Knowledge and Data Engineering Group, School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland) and Declan O'Sullivan (ADAPT Centre for Digital Content Technology, Knowledge and Data Engineering Group, School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland)
Copyright: © 2016 |Pages: 28
DOI: 10.4018/IJSWIS.2016040102
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Linked Open Data consists of a large set of structured data knowledge bases which have been linked together, typically using equivalence statements. These equivalences usually take the form of owl:sameAs statements linking individuals, but links between classes are far less common. Often, the lack of linking between classes is because the relationships cannot be described as elementary one to one equivalences. Instead, complex correspondences referencing multiple entities in logical combinations are often necessary if we want to describe how the classes in one ontology are related to classes in a second ontology. In this paper the authors introduce a novel Bayesian Restriction Class Correspondence Estimation (Bayes-ReCCE) algorithm, an extensional approach to detecting complex correspondences between classes. Bayes-ReCCE operates by analysing features of matched individuals in the knowledge bases, and uses Bayesian inference to search for complex correspondences between the classes these individuals belong to. Bayes-ReCCE is designed to be capable of providing meaningful results even when only small amounts of matched instances are available. They demonstrate this capability empirically, showing that the complex correspondences generated by Bayes-ReCCE have a median F1 score of over 0.75 when compared against a gold standard set of complex correspondences between Linked Open Data knowledge bases covering the geographical and cinema domains. In addition, the authors discuss how metadata produced by Bayes-ReCCE can be included in the correspondences to encourage reuse by allowing users to make more informed decisions on the meaning of the relationship described in the correspondences.
Article Preview

1. Introduction

Linked Open Data provides access to a wealth of information in standardised and navigable form, designed to enable these data to be combined easily. Bizer (2009) notes however that “… most Linked Data applications display data from different sources alongside each other but do little to integrate it further. To do so does require mapping of terms from different vocabularies to the applications target schema”. Links usually take the form of owl:sameAs statements linking individuals, but links between classes are far less common (Schmachtenberg, Bizer & Paulheim, 2014). Heterogeneity issues, such as differences in class scope or hierarchy granularity however mean that simple one to one correspondences between atomic classes are not always enough to describe the mappings between schemas, or more generally, ontologies. The YAGO2 (Suchanek, Kasneci & Weikum, 2008) knowledge base, for example, contains a rich class hierarchy based on WordNet (Miller, 1995), and includes many professions described as classes. An instance of a person in YAGO2 who is a film director, belongs to the class yago:FilmDirector. In contrast, version 3.9 DBpedia (Bizer, Lehmann, Kobilarov, Auer, Becker, Cyganiak & Hellman,2009) has a shallower class hierarchy, with professions described as attribute-values, not classes. In this version of the DBpedia ontology there is no named class for film directors. If one to one mappings between named classes is the only mechanism available, then we could say that yago:FilmDirector maps to dbpedia:Person with a subsumption relationship; but this does not describe which members of the class Person are film directors. If, instead, complex correspondences between non-atomic classes can be used, then it could be asserted that yago:FilmDirector corresponds with the set of instances of Person in DBpedia with the attribute dbpedia-owl:occupation set to dbpedia:Film_director. More formally, correspondences where at least one of the entities described in the correspondence is non-atomic are known as complex correspondences (Ritze, Meilicke, Svab-Zamazal & Stuckenschmidt, 2009).

Research has shown that complex correspondences can be classified into commonly reoccurring Correspondence Patterns (Scharffe, 2009). Extensional methods, which compare the instance sets of classes using some metric such as the Jaccard index, have been shown to be capable of detecting complex correspondences between ontologies used in Linked Open Data (Parundekar, Knoblock & Ambite, 2010; Parundekar, Knoblock & Ambite, 2012). However, extensional based approaches have several issues. When only small amounts of instance data are available they can give high scores to spurious matches, and when the amount of data are large, the search space of potential correspondences can grow very quickly. A subtler problem, which we will show in section 3, is that directly comparing the instance sets of two classes to test similarity is not consistent with the Open World Assumption. Furthermore, with existing extensional approaches there is an a priori assumption that all forms of complex correspondences are equally probable, and the approaches do not provide a systematic way for us to specify any prior beliefs we have that certain patterns of correspondences may be more probable than others.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing