Article Preview
Top1. Introduction
Linked Open Data provides access to a wealth of information in standardised and navigable form, designed to enable these data to be combined easily. Bizer (2009) notes however that “… most Linked Data applications display data from different sources alongside each other but do little to integrate it further. To do so does require mapping of terms from different vocabularies to the applications target schema”. Links usually take the form of owl:sameAs statements linking individuals, but links between classes are far less common (Schmachtenberg, Bizer & Paulheim, 2014). Heterogeneity issues, such as differences in class scope or hierarchy granularity however mean that simple one to one correspondences between atomic classes are not always enough to describe the mappings between schemas, or more generally, ontologies. The YAGO2 (Suchanek, Kasneci & Weikum, 2008) knowledge base, for example, contains a rich class hierarchy based on WordNet (Miller, 1995), and includes many professions described as classes. An instance of a person in YAGO2 who is a film director, belongs to the class yago:FilmDirector. In contrast, version 3.9 DBpedia (Bizer, Lehmann, Kobilarov, Auer, Becker, Cyganiak & Hellman,2009) has a shallower class hierarchy, with professions described as attribute-values, not classes. In this version of the DBpedia ontology there is no named class for film directors. If one to one mappings between named classes is the only mechanism available, then we could say that yago:FilmDirector maps to dbpedia:Person with a subsumption relationship; but this does not describe which members of the class Person are film directors. If, instead, complex correspondences between non-atomic classes can be used, then it could be asserted that yago:FilmDirector corresponds with the set of instances of Person in DBpedia with the attribute dbpedia-owl:occupation set to dbpedia:Film_director. More formally, correspondences where at least one of the entities described in the correspondence is non-atomic are known as complex correspondences (Ritze, Meilicke, Svab-Zamazal & Stuckenschmidt, 2009).
Research has shown that complex correspondences can be classified into commonly reoccurring Correspondence Patterns (Scharffe, 2009). Extensional methods, which compare the instance sets of classes using some metric such as the Jaccard index, have been shown to be capable of detecting complex correspondences between ontologies used in Linked Open Data (Parundekar, Knoblock & Ambite, 2010; Parundekar, Knoblock & Ambite, 2012). However, extensional based approaches have several issues. When only small amounts of instance data are available they can give high scores to spurious matches, and when the amount of data are large, the search space of potential correspondences can grow very quickly. A subtler problem, which we will show in section 3, is that directly comparing the instance sets of two classes to test similarity is not consistent with the Open World Assumption. Furthermore, with existing extensional approaches there is an a priori assumption that all forms of complex correspondences are equally probable, and the approaches do not provide a systematic way for us to specify any prior beliefs we have that certain patterns of correspondences may be more probable than others.