Merging, Repairing, and Querying Inconsistent Databases

Merging, Repairing, and Querying Inconsistent Databases

Luciano Caroprese (University of Calabria, Italy) and Ester Zumpano (University of Calabria, Italy)
DOI: 10.4018/978-1-60566-242-8.ch039
OnDemand PDF Download:
$37.50

Abstract

Data integration aims to provide a uniform integrated access to multiple heterogeneous information sources designed independently and having strictly related contents. However, the integrated view, constructed by integrating the information provided by the different data sources by means of a specified integration strategy could potentially contain inconsistent data; that is, it can violate some of the constraints defined on the data. In the presence of an inconsistent integrated database, in other words, a database that does not satisfy some integrity constraints, two possible solutions have been investigated in the literature (Agarwal, Keller, Wiederhold, & Saraswat, 1995; Bry, 1997; Calì, Calvanese, De Giacomo, & Lenzerini, 2002; Dung, 1996; Grant & Subrahmanian, 1995; S. Greco & Zumpano, 2000; Lin & Mendelzon, 1999): repairing the database or computing consistent answers over the inconsistent database. Intuitively, a repair of the database consists of deleting or inserting a minimal number of tuples so that the resulting database is consistent, whereas the computation of the consistent answer consists of selecting the set of certain tuples (i.e., those belonging to all repaired databases) and the set of uncertain tuples (i.e., those belonging to a proper subset of repaired databases).
Chapter Preview
Top

Introduction

Data integration aims to provide a uniform integrated access to multiple heterogeneous information sources designed independently and having strictly related contents. However, the integrated view, constructed by integrating the information provided by the different data sources by means of a specified integration strategy could potentially contain inconsistent data; that is, it can violate some of the constraints defined on the data. In the presence of an inconsistent integrated database, in other words, a database that does not satisfy some integrity constraints, two possible solutions have been investigated in the literature (Agarwal, Keller, Wiederhold, & Saraswat, 1995; Bry, 1997; Calì, Calvanese, De Giacomo, & Lenzerini, 2002; Dung, 1996; Grant & Subrahmanian, 1995; S. Greco & Zumpano, 2000; Lin & Mendelzon, 1999): repairing the database or computing consistent answers over the inconsistent database. Intuitively, a repair of the database consists of deleting or inserting a minimal number of tuples so that the resulting database is consistent, whereas the computation of the consistent answer consists of selecting the set of certain tuples (i.e., those belonging to all repaired databases) and the set of uncertain tuples (i.e., those belonging to a proper subset of repaired databases).

Example 1. Consider the database consisting of the relation Employee(Name, Age, Salary) where the attribute Name is a key for the relation, and suppose we have the integrated database DB = {Employee(Mary, 28, 20), Employee(Mary, 31, 30), Employee(Peter, 47, 50)}. DB is inconsistent and there are two possible repaired databases each obtained by deleting one of the two tuples whose value of the attribute Name is Mary. The answer to the query asking for the age of Peter is constituted by the set of certain tuples {<47>}, whereas the answer to the query asking for the age of Mary produces the set of uncertain values {<28>, <31>}.

This work proposes a framework for merging, repairing, and querying inconsistent databases. To this aim the problem of the satisfaction of integrity constraints in the presence of null values is investigated and a new semantics for constraints satisfaction, inspired by the approach presented in Bravo and Bertossi (2006), is proposed. The present work focuses on the inconsistencies of a database instance with respect to particular types of integrity constraints implemented and maintained in a commercial DBMS (database management system) such as primary keys, general functional dependencies, and foreign-key constraints.

The framework for merging, repairing, and querying inconsistent databases with functional dependencies restricted to primary-key constraints and foreign-key constraints has been implemented in a system prototype, called RAINBOW, developed at the University of Calabria.

Top

Database Merging

Once the logical conflicts owing to the schema heterogeneity have been resolved, conflicts may arise during the integration process among data provided by different sources. In particular, the same real-world object may correspond to many tuples that may have the same value for the key attributes but different values for some nonkey attribute.

Key Terms in this Chapter

Consistent Answer: A set of tuples, derived from the database, satisfying all integrity constraints.

Integration Operator: The operation of merging information by extracting coherent common information from several sources of data.

Database Repair: Minimal set of insert and delete operations that makes the database consistent.

Functional dependency: A functional dependency is a constraint between two sets of attributes in a relation from a database. Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R (written X ? Y) if and only if each X value is associated with at most one Y value.

Consistent Database: A database satisfying a set of integrity constraints.

Inconsistent Database: A database violating some integrity constraints.

Foreign-Key Constraint: A foreign-key constraint (also called referential integrity constraint) on a column ensures that the value in that column is found in the primary key of another table.

Data Integration: The activity of combining and matching information in different sources and resolving a variety of conflicts.

Complete Chapter List

Search this Book:
Reset