An Overview of Ontology-Driven Data Integration

An Overview of Ontology-Driven Data Integration

Agustina Buccella (Universidad Nacional del Comahue, Argentina) and Alejandra Cechich (Universidad Nacional del Comahue, Argentina)
DOI: 10.4018/978-1-60566-242-8.ch051
OnDemand PDF Download:
$37.50

Abstract

New software requirements have emerged because of innovation in technology, specially involving network aspects. The possibility enterprises, institutions and even common users can improve their connectivity allowing them to work as they are at the same time, generates an explosion in this area. Besides, nowadays it is very common to hear that large enterprises fuse with others. Therefore, requirements as interoperability and integrability are part of any type of organization around the world. In general, large modern enterprises use different database management systems to store and search their critical data. All of these databases are very important for an enterprise but the different interfaces they possibly have make difficult their administration. Therefore, recovering information through a common interface becomes crucial in order to realize, for instance, the full value of data contained in the databases (Hass & Lin, 2002).
Chapter Preview
Top

Introduction

New software requirements have emerged because of innovation in technology, specially involving network aspects. The possibility enterprises, institutions and even common users can improve their connectivity allowing them to work as they are at the same time, generates an explosion in this area. Besides, nowadays it is very common to hear that large enterprises fuse with others. Therefore, requirements as interoperability and integrability are part of any type of organization around the world. In general, large modern enterprises use different database management systems to store and search their critical data. All of these databases are very important for an enterprise but the different interfaces they possibly have make difficult their administration. Therefore, recovering information through a common interface becomes crucial in order to realize, for instance, the full value of data contained in the databases (Hass & Lin, 2002).

Thus, in the ‘90s the term Federated Database emerged to characterize techniques for proving an integrating data access, resulting in a set of distributed, heterogeneous and autonomous databases (Busse, Kutsche, Leser & Weber, 1999; Litwin, Mark & Roussoupoulos, 1990; Sheth & Larson, 1990). Here is where the concept of Data Integration appears. This concept refers to the process of unifying data sharing some common semantics but originated from unrelated sources. Several aspects must be taken into account when working with Federated Systems because the main characteristics of these systems make more difficult the integration tasks. For example, the autonomy of the information sources, their geographical distribution and the heterogeneity among them, are some of the main problems we must face to perform the integration. Autonomy means that users and applications can access data through a federated system or by their own local system. Distribution (Ozsu & Valduriez, 1999) refers to data (or computers) spread among multiple sources and stored in a single computer system or in multiple computer systems. These computer systems may be geographically distributed but interconnected by a communication network. Finally, heterogeneity relates to different meanings that may be inferred from data stored in databases. In (Cui & O’Brien, 2000), heterogeneity is classified into four categories: structural, syntactical, system, and semantic. Structural heterogeneity deals with inconsistencies produced by different data models whereas syntactical heterogeneity deals with consequences of using different languages and data representations. On the other hand, system heterogeneity deals with having different supporting hardware and operating systems. Finally, semantic heterogeneity (Cui & O’Brien, 2000) is one of the most complex problems faced by data integration tasks. Each information source included in the integration has its own interpretation and assumptions about the concepts involved in the domain. Therefore, it is very difficult to determine when two concepts belonging to different sources are related. Some relations among concepts that semantic heterogeneity involves are: synonymous, when the sources use different terms to refer to the same concept; homonymous, when the sources use the same term to denote completely different concepts; hyponym, when one source contains a term less general than another in another source; and hypernym, when one source contains a term more general than another in another source; etc.

Key Terms in this Chapter

Distributed Information System: A set of information systems physically distributed over multiple sites, which are connected with some kind of communication network.

Federated Database: Idem FIS, but the information systems only involve databases (i.e. structured sources).

Federated Information System (FIS): A set of autonomous, distributed and heterogeneous information systems, which are operated together to generate a useful answer to users.

Semantic Heterogeneity: Each information source has a specific vocabulary according to its understanding of the world. The different interpretations of the terms within each of these vocabularies cause the semantic heterogeneity.

Ontological Scalability: The ability of easily adding new information sources without generate substantial changes in the ontological components of the integrated system.

Heterogeneous Information System: A set of information systems that differs in syntactical or logical aspects like hardware platforms, data models or semantics.

Ontological Reusability: The ability of creating ontologies that can be used in different contexts or systems.

Ontology Matching: the process of finding relationships or correspondences between entities of different ontologies.

Ontological Changeability: The ability of changing some structures of an information source without producing substantial changes in the ontological components of the integrated system.

Data Integration: Process of unifying data that share some common semantics but originate from unrelated sources.

Ontology: It provides a vocabulary to represent and communicate knowledge about the domain and a set of relationship containing the terms of the vocabulary at a conceptual level.

Complete Chapter List

Search this Book:
Reset