A Semantic Similarity Analysis for Data Mappings between Heterogeneous XML Schemas

A Semantic Similarity Analysis for Data Mappings between Heterogeneous XML Schemas

Jaewook Kim (University of Maryland Baltimore County, USA) and Yun Peng (University of Maryland Baltimore County, USA)
DOI: 10.4018/978-1-60960-485-1.ch003
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

One of the most critical steps to integrating heterogeneous e-business applications using different XML schemas is schema mapping, which is known to be costly and error-prone. Past research on schema mapping has not made full use of semantic information imbedded in the hierarchical structure of the XML schema. This chapter investigates the existing schema mapping approaches and proposes an innovative semantic similarity analysis approach to facilitate XML schema mapping, merging and reuse. Several key innovations are introduced to better utilize available semantic information. These innovations include: (1) a layered structure analysis of XML schemas, (2) layer-specific semantic similarity measures, and (3) an efficient semantic similarity analysis using parallel and distributed computing technologies. Experimental results using two different schemas from a real world application demonstrate that the proposed approach is valuable for addressing difficulties in XML schema mapping.
Chapter Preview
Top

Background

The Challenges for Data Mappings between Heterogeneous XML Schemas

Over the past decades, the eXtensible Markup Language (XML) has emerged as one of the primary languages to help information systems in sharing structured data. Especially, XML schemas have been widely used in the e-Business for enterprises to exchange the business documents with their partners in a supply chain. The popularity of the XML and XML schema leads to an exponential growth of Business-to-Business (B2B) transactions. This success, however, leads to several problems: (1) individual enterprises often create their own XML schemas with information most relevant to their own needs; (2) different enterprise groups define different but similar XML schemas; and (3) the enterprises often extend or redefine the existing standard XML schema for their own needs. To successfully integrate heterogeneous e-Business systems, therefore, it is now critical to integrate their respective different XML schemas. This is what is called schema mapping.

The schema mapping is the process of identifying if and how two schemas are semantically related (Miller et al, 1994; Rahm & Bernstein, 2001; Shvaiko & Euzenat, 2005). It is one of the most important steps to integrate heterogeneous e-Business systems; however, it is typically largely performed manually by human engineers who are at best supported by some graphical interface tools. This manual mapping process is known to be very labor-intensive, costly, and error-prone (Gal, 2006; Rahm & Bernstein, 2001). As the e-Business systems grow to handle more complex databases and applications, their schemas become larger and more complicated. This further increases the search space to be examined as well as the number of correspondences to be identified. As a result, it is critical to automate the schema mapping task as much as possible to reduce the costs of labor-intensive data integration work and to reduce the mapping errors.

The XML schema mapping can be classified into two types depending on the types of the e-Business standard schemas: component schema and document schema. The component schema only contains reusable and extensible components (types or elements) as global type definition (e.g., OAG Common Core Component schema), while the document schema contains a global root element to define one valid XML document (e.g., Purchase Order Schema). The document schema may reuse or extend the components defined by the component schema. For schema integration, the component schema mapping mainly identifies the relations between global components (types or elements), while the document schema mapping mainly identifies relations between leaf nodes (elements or attributes). In this research, we focus on the component schema mapping.

Complete Chapter List

Search this Book:
Reset