Research on Multi-Source Data Integration Based on Ontology and Karma Modeling

Research on Multi-Source Data Integration Based on Ontology and Karma Modeling

Hongyan Yun (College of Computer Science and Technology, Qingdao University, Qingdao, China), Ying He (School of Electronic Information, Qingdao University, Qingdao, China), Li Lin (College of Computer Science and Technology, Qingdao University, Qingdao, China) and Xiaohong Wang (Qilu University of Technology, Shandong Academy of Science, Shandong Computer Science Center, Shandong, China)
Copyright: © 2019 |Pages: 19
DOI: 10.4018/IJIIT.2019040105

Abstract

The purpose of data integration is that integrates multi-source heterogeneous data. Ontology solves semantic describing of multi-source heterogeneous data. The authors propose a practical approach based on ontology modeling and an information toolkit named Karma modeling for fast data integration, and demonstrate an application example in detail. Armed Conflict Location & Event Data Project (ACLED) is a publicly available conflict event dataset designed for disaggregated conflict analysis and crisis mapping. The authors analyzed the ACLED dataset and domain knowledge to build an Armed Conflict Event ontology, then constructed Karma models to integrate ACLED datasets and publish RDF data. Through SPARQL query to check the correctness of published RDF data. Authors design and developed an ACLED Query System based on Jena API, Canvas JS, and Baidu API, etc. technologies, which provides convenience for governments and researches to analyze regional conflict events and crisis early warning, and it verifies the validity of constructed ontology and the correctness of Karma modeling.
Article Preview

Introduction

Big data is widely described as having three dimensions: volume, velocity, and variety (Knoblock & Szekely, 2015). Volume refers to the problems of how to deal with large amount of data sets. Velocity refers to dealing with real-time streaming data, where it may be impossible to store all data for later processing. Variety refers to dealing with multiple types of sources, and different formats of the data. In real life, people need to process multi-source heterogeneous data frequently. Data integration aims to integrate data from multiple heterogeneous data sources together so that users can ignore semantic differences and structural differences (Noy, 2004).

Ontology provides an effective representation to represent concepts and the semantic relationships among concepts (Senthilnayaki, Venkatalakshmi, &Kannan, 2015). Moreover, ontology can be easily expressed by using formal semantic markup languages such as RDF and OWL. Ontology cannot only effectively solve the problem of multi-source data description, but also can break through the bottleneck of semantic mapping. Using ontology for data integration cannot only deal with structure and semantic differences among datasets, but also can be as the basis of data query and reasoning. This paper focuses on exploiting semantic technology to solve the problem of multi-source heterogeneous data variety.

We propose an approach to integrate data from multiple types of sources (for example, spreadsheets, relational databases, web services, and others) and in widely different formats including both relational and hierarchical data (that is XML or JSON). In this approach, domain ontology is used to describe data sources semantically, and semantic integration for multi-source data is realized by using an information integration tool named Karma (Knoblock & Szekely, 2008) to map multiple datasets to RDF data. The use of semantics in this integration process is key to building an approach that scales to large numbers of heterogeneous sources. In this paper, according to the proposed method framework, authors demonstrate the integration and application of ACLED (Armed Conflict Location & Event Data Project) (Raleigh et al., 2017) data in detail.

ACLED is a publicly available conflict event dataset designed for disaggregated conflict analysis and crisis mapping. This dataset codes the dates and locations of all reported political violence and protest events in over 60 developing countries, including Africa and Asia. The data come from news reports, publications by civil society and human rights organizations, and security updates from international organizations. Information is recorded on the battles, killings, riots, and recruitment activities of rebels, governments, militias, armed groups, and protesters. ACLED has recorded close to 200,000 individual events, with ongoing data collection focused on Africa and ten countries in South and Southeast Asia. These data can be used for immediate and long-term analysis and mapping of political violence and protest across developing countries through use of historical data from 1997, as well as informing humanitarian and development work in crisis and conflict-affected contexts through real time data updates and reports.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 16: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 15: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing