Understanding and Modeling Context in Data Integration

Understanding and Modeling Context in Data Integration

William T. Sabados, Harry S. Delugach
DOI: 10.4018/ijcssa.2014010101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The pragmatic context of information is a fundamental characteristic that is not often formally addressed in data integration. This paper discusses the challenges of modeling the multiple contexts at play in data integration. A simple data integration context modeling framework is introduced that we believe addresses important issues of representing a pragmatic context. It allows for multiple data sources from similar domains to be brought together without having to designate one as the “true” semantics. An example is provided showing how this approach supports integration efforts.
Article Preview
Top

Obstacles To Capturing Context

If context is important to data integration, why is it not already captured and considered in current approaches? There are a number of contributing factors that may help explain why context is not generally considered a first class concern in terms of data integration. First, many data sources are not created with the intent of data integration in mind. The original goal in creating the data source may possibly be for a very specific use. Data integration may just be an opportunity to leverage existing captured data. When originally capturing the data, however, much of the knowledge forming the context of the data may simply be implicit information. For instance, if the goal is to capture information about students at a university, then the data source may neglect to document which university the students are attending because the identity of the university is understood.

Another reason implicit or assumed information may not be captured in a data source is simply because it is not efficient to do so. Data sources are primarily designed to be fast and efficient data storage and retrieval solutions. One way to build more efficient data sources is to remove or reduce redundant information, which in turn may begin to strip out some of the information that might be useful for determining context. Also, when designing a new data source, one may question the value of including information that is consistently the same value for every tuple in the data source (such as the university in the above example). If included, information that is non-variant for every tuple in the database, while useful for establishing context, will result in wasted storage space. For instance, is it worthwhile indicating that every student in a database goes to a particular university if the database is being designed solely to track students at a single university? This information is either never considered for inclusion in the data source because it is too obvious (implicit) or factored out of the data source because it is inefficient to store. Ideally this information should be added to the data source metadata in order to provide a context of what the data source designers intended to capture and store.

A third reason for not recording the context of a data source is simply that the people who created the database expect that they themselves will be performing whatever data integration is necessary in the future. The implicit context information is so “obvious” to them (having designed and constructed the data source) that they may not think any context information is worth recording (or at least, not deserving of much time and effort).

Capturing data source metadata is hardly a new or novel idea. Many data source solutions provide some capabilities to annotate data sources with designer comments. However, data integration remains a difficult task and few if any systems make a pointed effort to establish a data source context. Perhaps the chore of integration would be made easier if context information were more readily available.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 6: 2 Issues (2018)
Volume 5: 2 Issues (2017)
Volume 4: 2 Issues (2016)
Volume 3: 2 Issues (2015)
Volume 2: 2 Issues (2014)
Volume 1: 2 Issues (2013)
View Complete Journal Contents Listing