A Context-Independent Ontological Linked Data Alignment Approach to Instance Matching

A Context-Independent Ontological Linked Data Alignment Approach to Instance Matching

Armando Barbosa, Ig I. Bittencourt, Sean W. Siqueira, Diego Dermeval, Nicholas J. T. Cruz
Copyright: © 2022 |Pages: 29
DOI: 10.4018/IJSWIS.295977
Article PDF Download
Open access articles are freely available for download

Abstract

Linking data by finding matching instances in different datasets requires considering many characteristics, such as structural heterogeneity, implicit knowledge, and URI (Uniform Resource Identifier)-oriented identification. The authors propose a context-independent approach to align Linked data through an alignment process based on the ontological model’s components and considering data’s multidimensionality. The researchers experimented with the proposed approach against two methods for aligning linked data in two datasets and evaluated precision, recall, and f-measure metrics. The authors also conducted a case study in a real scenario considering a Brazilian publication dataset on computers and education. This study’s results indicate that the proposed approach overcomes the other methods (regarding the precision, recall, and f-measure metrics), requiring less work when changing the dataset domain. This work’s main contributions include enabling real datasets to be semi-automatically linked, presenting an approach capable of calculating resource similarity.
Article Preview
Top

Introduction

Publishing or maintaining Linked data on the Web goes beyond making datasets available through resource description framework (RDF) serializations, which is the innovations and applications cornerstone of semantic web and information systems (Avila-Garzon, 2020). Then, newly published data must be linked to other existing datasets. However, creating links between datasets requires careful analysis by an expert, which, despite being an effective approach, is not scalable, given that the amount of data published is constantly increasing. Consequently, the manual publishing process is unviable. Therefore, to efficiently build the Web of data, there must be solutions capable of linking data automatically or semi-automatically.

Automatically linking data is a problem recognized by many communities. In Databases, the problem is known by record linkage (Gu et al., 2003; Karr et al., 2019), which aims to identify and link resources that are judged to represent the same real-world entity. Additionally, it is possible to find other terms for this problem, such as the entity resolution problem (Menestrina et al., 2005; Ebraheem et al., 2017; Wu et al., 2020), deduplication (Sarawagi and Bhamidipaty, 2002; Xu et al., 2017; Yang et al., 2019), and Instance matching.

Instance matching is the term that the Linked data community uses to refer to the problem. In this community, the main goal is to find matching instances in different datasets (Abubakar et al., 2018). However, instance matching has additional characteristics (Castano et al., 2011; Mountantonakis & Tzitzikas, 2019; Azmy et al., 2019), such as (i) structural heterogeneity, which refers to variation in the structure of the instances; (ii) implicit knowledge, which refers to the characteristics and constraints exhibited by the domain; and (iii) URI-oriented identification, which refers to reusing URIs to identify new information about existing instances. Thus, there is a need for specific solutions for the correct execution of the instance matching process.

To identify and link resources on the Web, the community has been developing a growing number of solutions. The Ontology Alignment Evaluation Initiative (OAEI) conducts an annual evaluation consisting of aligning two predefined datasets and comparing the alignment generated by the solution with the reference alignment. However, according to Homoceanu et al. (2014), the solutions are not ready to automatically align data despite the good results. Most works are used only on conventional OAEI datasets with small ontologies (Ferranti et al., 2021), and there is a small number of real-world ontology matching application approaches (Otero-Cerdera et al., 2015; Ferranti et al., 2021). Also, no technique stands out from the others in all aspects (Xue & Tang, 2017).

This study proposes a context-independent approach for the alignment of Linked data through an alignment process that considers aspects of the ontological model’s data and characteristics. Data properties and relationships drive the alignment of resources/instances. For this purpose, a cascade alignment approach is proposed. Moreover, the proposed approach addresses the alignment between real datasets, which enables reliable alignment of datasets distributed on the Web. This work provides the following contributions: i) development of a context-independent process for the alignment of Linked data; ii) enabling the execution of the alignment directly in the data storage; and iii) presenting a real-world case study dealing with heterogeneity and data quality issues.

Then, this research targets the following problem:

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing