Integrity constraints (or simply “constraints”) are formal representations of invariant conditions for the semantic correctness of database records. Constraints can be expressed in declarative languages such as datalog, predicate logic, or SQL. This article highlights the historical background of integrity constraints and the essential features of their simplified incremental evaluation. It concludes with an outlook on future trends.
Integrity has always been an important issue for database design and control, as attested by many early publications (e.g., Bernstein & Blaustein, 1982; Bernstein, Blaustein, & Clarke, 1980; Codd, 1970, 1979; Eswaran & Chamberlin, 1975; Fraser, 1969; Hammer & McLeod, 1975; Hammer & Sarin, 1978; Nicolas, 1978, 1982; Wilkes, 1972); later ones are too numerous to mention. Expressing database semantics as invariant properties persisting across updates had first been proposed by Minsky (1974). Florentin (1974) suggested expressing integrity constraints as predicate logic statements. Stonebraker (1975) proposed formulating and checking integrity constraints declaratively as SQL-like queries.
Functional dependencies (Armstrong, 1974; Codd, 1970) are a fundamental kind of constraints to guide database design. Referential integrity has been part of the 1989 SQL ANSI and ISO standards (McJones, 1997). The SQL2 standard (1992) introduced the CHECK and ASSERTION constructs (i.e., table-bound and table-independent SQL query conditions) as the most general means to express integrity constraints declaratively (Date & Darwen, 1997). Since the 1990s, uniqueness constraints, foreign keys, and complex queries involving EXISTS and NOT became common features in commercial databases. Thus, arbitrarily general and complex integrity constraints can now be expressed and evaluated in most relational databases. However, most of them offer efficient support only for the following three simple kinds of declarative constraints:
Domain Constraints: Restrictions on the permissible range of attribute values of tuples in table columns, including scalar SQL data types and subsets thereof, as well as options for default and null values.
Uniqueness Constraints: As enforced by the UNIQUE construct on single columns, and UNIQUE INDEX and PRIMARY KEY on any combination of one or several columns in a table, preventing multiple occurrences of values or combinations thereof.
Foreign Key Constraints: For establishing a relationship between the tuples of two tables, requiring identical column values. For instance, a foreign key on column emp of relation works_in requires that the emp value of each tuple of works_in must occur in the emp_id column of table employee, and that the referenced column (emp_id in the example) has been declared as primary key.
For more general constraints, SQL manuals usually recommend using procedural triggers or stored procedures instead of declarative constructs. This is because such constraints may involve nested quantifications over huge extents of several tables. Thus, their evaluation can easily become prohibitively costly. However, declarativity does not need to be sacrificed for efficiency, as shown by many methods of simplified integrity checking as cited in this survey. They are all based on the seminal paper (Nicolas, 1982).
Key Terms in this Chapter
Inconsistency Tolerance: A practically indispensable property. Integrity checking is inconsistency-tolerant if the invariance of all satisfied cases of integrity constraints across updates can be ensured, even if some other cases are violated.
Integrity Constraint: Integrity constraints are invariant properties of the database that evolve via updates. Often, it is convenient to state them as denials, that is, yes/no queries that are required to return the empty answer in each database state.
Static vs. Dynamic Integrity Constraints: Static integrity constraints are semantic properties that are invariant across evolving database states. Dynamic integrity constraints refer explicitly to several (mostly consecutive) states or to their transitions, typically involving temporal or procedural constructs.
Business Rule: Statement for defining or constraining the evolution of data pertaining to an enterprise’s business. Business rules can be represented and enforced by integrity constraints.
Integrity: Semantic consistency, that is, the correctness of stored data with regard to their intended meaning, as expressed by integrity constraints. Not to be confused with namesake issues related to data security, serializability of concurrent transactions or sound failure recovery.
Simplification: A methodological approach to reduce complexity and costs of integrity checking. Also the simplified forms of integrity constraints generated by such methods are called simplifications.
Integrity Satisfaction, Integrity Violation: Integrity is satisfied if each integrity constraint in the database schema, queried as a denial, returns the empty answer. Integrity is violated if any one of these integrity constraints returns a non-empty answer. Then, the database is also said to be inconsistent.