Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Automatic Schema-Independent Linked Data Instance Matching System

Khai Nguyen, Ryutaro Ichise

Source Title: International Journal on Semantic Web and Information Systems (IJSWIS) 13(1)

DOI: 10.4018/IJSWIS.2017010106

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The goal of linked data instance matching is to detect all instances that co-refer to the same objects in two linked data repositories, the source and the target. Since the amount of linked data is rapidly growing, it is important to automate this task. However, the difference between the schemata of source and target repositories remains a challenging barrier. This barrier reduces the portability, accuracy, and scalability of many proposed approaches. The authors present automatic schema-independent interlinking (ASL), which is a schema-independent system that performs instance matching on repositories with different schemata, without prior knowledge about the schemata. The key improvements of ASL compared to previous systems are the detection of useful attribute pairs for comparing instances, an attribute-driven token-based blocking scheme, and an effective modification of existing string similarities. To verify the performance of ASL, the authors conducted experiments on a large dataset containing 246 subsets with different schemata. The results show that ASL obtains high accuracy and significantly improves the quality of discovered coreferences against recently proposed complex systems.

Article Preview

Top

1. Introduction

Instance matching (aka entity reconciliation, entity resolution, or record linkage) (Winkler, 2006) is the process of detecting coreferent instances, which describe the same object. One prominent application of instance matching is data integration. Since data are created independently in many repositories, gathering information from multiple sources can greatly improve the completeness and diversity of the objects of interest. Detecting coreferent instances is indispensable for achieving perfect integration quality. In linked data, instance matching also plays an important role in the data publication process. The newly published instances should be linked to their existing coreferent instances on the web of linked data. In other words, instance matching, together with other tools, allows linked data instead of just enriched data to become closer to the vision of the semantic web (Jain, Hitzler, Yeh, Verma, & Sheth, 2010). Instance matching in linked data (Ferrara, Nikolov, & Scharffe, 2011) is also considered as a representative of link discovery, because the result of matching can be used to generate the owl:sameAs¹ links, which are conventionally used to declare the coreferences.

The major challenges of instance matching are the ambiguity of instances and the inconsistency between different repositories. The first challenge is the natural heterogeneity of real-world objects (e.g., Tokyo, Tokyo Station, Tokyo Imperial Palace). The second challenge is the different schemata, in which the attributes of objects are declared through arbitrary properties (e.g., ‘name’ and ‘label’ co-describe the same attribute). In linked data and other sorts of web-based data, some of the challenges are even harder compared to other forms of structured data because most resources are contributed by the prolific Internet community. On the one hand, the linked data resources provide excellent benefits thanks to the plentifulness of the data. However, on the other hand, they increase the chance of having more instances that refer to very similar objects. Many linked data sources are constructed by many users or from crowdsourced data. Consequently, the inconsistencies of schemata become more complex. For instance matching on linked data, it is more difficult to construct all correct property mappings between given schemata. However, the difficulty can be solved by a schema-independent system. Therefore, schema-independent instance matching systems, which can work on repositories with any schema, have the highest generality.

Many years of investigating a perfect solution for linked data instance matching have resulted in considerable achievements, but not yet the optimal solution. Numerous studies have been published, and they vary from manually operated to semi-automatic and automatic systems. To use manual systems (Volz, Bizer, Gaedke, & Kobilarov, 2009, Ngomo & Auer, 2011, Li, Tang, Li, & Luo, 2009), the user needs to provide matching specifications (e.g., property mappings, similarity measures). Semi-automatic systems try to reduce the user involvement by suggesting a specification (Lyko, Höffner, Speck, Ngomo, & Lehmann, 2013) or by requiring a small number of labeled data (Ngomo, Lehmann, Auer, & Höffner, 2011, Isele & Bizer, 2013). Recently, studies on automatic approaches have increased because of their generality. Existing automatic systems can be categorized into three families: unsupervised learning of specifications (Nikolov, d’Aquin, & Motta, 2012, Ngomo & Lyko, 2013); probabilistic matching (Niepert, Meilicke, & Stuckenschmidt, 2010, Suchanek, Abiteboul, & Senellart, 2011); and similarity-based matching with statistical estimation of property mappings (Araujo, Tran, DeVries, Hidders, & Schwabe, 2012, Nguyen, Ichise, & Le, 2012a). The first two families have a limitation in scalability, because they either repeatedly browse the data or memorize all computations. Meanwhile, the third one is more scalable, due to its simple architecture. One drawback of previous systems in this third family is the low accuracy on large data. However, with its advantage in scalability, this is still one of the most promising solutions.

Complete Article List

Search this Journal:

Reset

Volume 20: 1 Issue (2024)

Volume 19: 1 Issue (2023)

Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 17: 4 Issues (2021)

Volume 16: 4 Issues (2020)

Volume 15: 4 Issues (2019)

Volume 14: 4 Issues (2018)

Volume 13: 4 Issues (2017)

Volume 12: 4 Issues (2016)

Volume 11: 4 Issues (2015)

Volume 10: 4 Issues (2014)

Volume 9: 4 Issues (2013)

Volume 8: 4 Issues (2012)

Volume 7: 4 Issues (2011)

Volume 6: 4 Issues (2010)

Volume 5: 4 Issues (2009)

Volume 4: 4 Issues (2008)

Volume 3: 4 Issues (2007)

Volume 2: 4 Issues (2006)

Volume 1: 4 Issues (2005)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Automatic Schema-Independent Linked Data Instance Matching System

Abstract

1. Introduction

Complete Article List