Inconsistency-Tolerant Integrity Checking

Inconsistency-Tolerant Integrity Checking

Hendrik Decker (Instituto Technológico de Informática & Ciudad Politécnica de la Innovación, Spain) and Davide Martinenghi (Politecnico di Milano, Italy)
DOI: 10.4018/978-1-60566-242-8.ch038
OnDemand PDF Download:
$37.50

Abstract

Integrity checking has been a perennial topic in almost all database conferences, journals, and research labs. The importance of the issue is testified by very large amounts of research activities and publications. They are motivated by the fact that integrity checking is practically unfeasible for significant amounts of stored data without a dedicated approach to optimize the process. Basic early approaches have been extended to deductive, object-relational, XML- (extensible markup language) based, distributed, and other kinds of advanced database technology. However, the fundamental ideas that are already present in the seminal paper (Nicolas, 1982) have not changed much. The basic principle is that, in most cases, a so-called simplification, that is, a simplified form of the set of integrity constraints imposed on the database, can be obtained from a given update (or just an update schema) and the current state of the database (or just the database schema). Thus, integrity, which is supposed to be an invariant of all possible database states, is checked upon each update request, which in turn is authorized only if the check of the simplification yields that integrity is not violated. Here, simplified essentially means more efficiently evaluated at update time. A general overview of the field of simplified integrity checking is provided in Martinenghi, Christiansen, and Decker (2006).
Chapter Preview
Top

Introduction

Integrity checking has been a perennial topic in almost all database conferences, journals, and research labs. The importance of the issue is testified by very large amounts of research activities and publications. They are motivated by the fact that integrity checking is practically unfeasible for significant amounts of stored data without a dedicated approach to optimize the process. Basic early approaches have been extended to deductive, object-relational, XML- (extensible markup language) based, distributed, and other kinds of advanced database technology. However, the fundamental ideas that are already present in the seminal paper (Nicolas, 1982) have not changed much.

The basic principle is that, in most cases, a so-called simplification, that is, a simplified form of the set of integrity constraints imposed on the database, can be obtained from a given update (or just an update schema) and the current state of the database (or just the database schema). Thus, integrity, which is supposed to be an invariant of all possible database states, is checked upon each update request, which in turn is authorized only if the check of the simplification yields that integrity is not violated. Here, simplified essentially means more efficiently evaluated at update time. A general overview of the field of simplified integrity checking is provided in Martinenghi, Christiansen, and Decker (2006).

A common point of view by which the need for integrity checking is justified can be characterized as follows. Whenever a database contains erroneous, unwanted, or faulty information, that is, data that violate integrity, answers to queries cannot be trusted. Hence, simplification methods for integrity checking usually address this issue in a very drastic way: In order to avoid possibly wrong answers that are due to integrity violation, incorrect stored data that cause inconsistency need to be completely prevented. However, this drastic attitude is most often unrealistic: The total absence of unwanted, incorrect, or unexpected data is definitely an exception in virtually all real-world scenarios. Still, it is desirable to preserve the good data in the database while preventing more bad ones from sneaking in and, thus, further diminish the trustworthiness of answers to queries.

The intolerant attitude of the simplification approach of integrity checking toward data that violate integrity is reflected in Nicolas (1982) and virtually all publications on the same subject that came after it. They all postulate the categorical premise of total integrity satisfaction, that is, that each constraint must be satisfied in the old database state, given when an update is requested but not yet executed. Otherwise, correctness of simplification is not guaranteed.

As opposed to the attention granted to integrity checking in academia, support for the declarative specification and efficient evaluation of semantic integrity in practical systems has always been relatively scant, apart from standard constructs such as constraints on column values, or primary and foreign keys in relational database tables. Various reasons have been identified for this lack of practical attention. Among them, the logically abstract presentation of many of the known simplification methods is often mentioned. Here, we focus on another issue of integrity checking that we think is even more responsible for a severe mismatch between theory and practice: Hardly any database ever is in a perfectly consistent state with regard to its intended semantics. Clearly, this contradicts the fundamental premise that the database must always satisfy integrity. Thus, due to the intolerance of classical logic with respect to inconsistency, integrity checking is very often not considered an issue of practical feasibility or even relevance.

Based on recent research results, we are going to argue that inconsistency is far less harmful for database integrity than as suggested by commonly established results. We substantiate our claim by showing that, informally speaking, the consistent part of a possibly inconsistent database can be preserved across updates. More precisely, we show that, if the simplified form of an integrity theory is satisfied, then each instance of each constraint that has been satisfied in the old state continues to be satisfied in the new updated state, even if the old database is not fully consistent. Therefore, such an approach can rightfully be called inconsistency tolerant. Yet, we are also going to see that the use of inconsistency-tolerant integrity checking methods prevents an increase of inconsistency and may even help to decrease it.

Key Terms in this Chapter

Integrity Satisfaction: A given database state satisfies integrity if each integrity constraint in the database schema is satisfied. A constraint in prenex normal form is satisfied if, when posed as a query, it returns the answer yes. A constraint in denial form is satisfied if, when posed as a query, it returns the answer no.

Simplification Method: A procedure taking as input an integrity theory, an update, and possibly a database state, and returning as output a pretest or posttest thereof.

Posttest: A posttest of an integrity theory ? (for an update and, possibly, a database) is an integrity theory, easier to evaluate than ?, that evaluates in the new state exactly as ?.

Inconsistency Tolerance: Property of simplification methods that makes them usable also in the presence of inconsistency. Whenever an inconsistency-tolerant method returns that integrity is satisfied, the preservation of all consistent cases of the integrity theory in the updated state is guaranteed even if there were inconsistent cases before the update.

Case: A case of a constraint W is another constraint obtained from W by substituting some (possibly all) of its global variables with constants or other variables not in W.

Integrity: In databases, integrity stands for semantic consistency, that is, the correctness of stored data with regard to their intended meaning, as expressed by integrity constraints. Integrity should not be confused with a namesake issue often associated with data security.

Integrity Violation: A given database state violates integrity if at least one of the integrity constraints in the database schema is violated. A constraint in prenex normal form is violated if, when posed as a query, it returns the answer no. A constraint in denial form is violated if, when posed as a query, it returns the answer yes.

Inconsistency: Property of a database state that does not satisfy its associated integrity theory. Inconsistency is synonymous to integrity violation and is the contrary of integrity satisfaction.

Pretest: A pretest of an integrity theory ? (for a given update and, possibly, a database) is an integrity theory, easier to evaluate than ?, that evaluates in the old state exactly as ? does in the new state.

Update: An update is a mapping from the space of databases into itself. Its input is the old state, while its output is the new state of the database. Many simplification methods use the difference between old and new state for making integrity checking more efficient.

Integrity Constraint: A logical sentence expressing a statement about the semantics, that is, the intended meaning of the extensions of tables and views in the database. It is usually expressed either as closed formulas in prenex normal form or as denials. Both can be evaluated as queries for determining integrity satisfaction or violation.

Complete Chapter List

Search this Book:
Reset