Semantic Web and Geospatial Unique Features Based Geospatial Data Integration

Semantic Web and Geospatial Unique Features Based Geospatial Data Integration

Ying Zhang (North China Electric Power University, China), Chaopeng Li (North China Electric Power University, China), Na Chen (Hebei Vocational College of Rail Transportation, China), Shaowen Liu (North China Electric Power University, China), Liming Du (North China Electric Power University, China), Zhuxiao Wang (North China Electric Power University, China) and Miaomiao Ma (North China Electric Power University, China)
DOI: 10.4018/978-1-5225-8054-6.ch011


Since large amount of geospatial data are produced by various sources and stored in incompatible formats, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. First, we provide a uniform integration paradigm for users to retrieve geospatial data. Then, we align the retrieved geospatial data in the modeling process to eliminate heterogeneity with the help of Karma. Our main contribution focuses on addressing the third problem. Previous work has been done by defining a set of semantic rules for performing the linking process. However, the geospatial data has some specific geospatial relationships, which is significant for linking but cannot be solved by the Semantic Web techniques directly. We take advantage of such unique features about geospatial data to implement the linking process. In addition, the previous work will meet a complicated problem when the geospatial data sources are in different languages. In contrast, our proposed linking algorithms are endowed with translation function, which can save the translating cost among all the geospatial sources with different languages. Finally, the geospatial data is integrated by eliminating data redundancy and combining the complementary properties from the linked records. We mainly adopt four kinds of geospatial data sources, namely, OpenStreetMap(OSM), Wikmapia, USGS and EPA, to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).
Chapter Preview

1. Introduction

Geospatial data integration can be used to improve data quality, to reduce costs, and to make data more useful to the public(Auer et al.,2009; Bittner et al.,2009; Brodt et al.,2010; Kuhn,2002; Su et al.,2012; De Carvalho et al.,2012; Su & Lochovsky,2010; Ballatore et al.,2014; Buccella et al.,2010; Fonseca, Egenhofer et al.,2002; Malik et al.,2010;Vaccari et al.,2009). However, the large amount of data is produced by a variety of sources, stored in incompatible formats, and accessible through different GIS applications. Thus, geospatial data integration is difficult and becoming an increasingly important subject.

To implement the geospatial data integration, four problems need to be addressed: geospatial data retrieving, modeling, linking and integrating. This paper proposes corresponding approach for each issue. Besides, our work takes advantage of Karma (Szekely et al.,2011; Knoblock et al.,2012; Taheriyan et al.,2012; Tuchinda et al.,2011; Knoblock et al.,2011), which is a general information integration tool. It supports importing data from a variety of sources including relational databases, spreadsheet, KML and semi-structured Web pages, and publishing data in a variety of formats such as RDF. The source modeling work is based on these functions:

Complete Chapter List

Search this Book: