Set-Oriented Queries in SQL

Set-Oriented Queries in SQL

Antonio Badia (University of Louisville, USA)
DOI: 10.4018/978-1-4666-8767-7.ch002
OnDemand PDF Download:
No Current Special Offers


Set-oriented queries are those that require a condition to hold of an arbitrary set of rows, either when comparing the set to a single row or to another set of rows. Examples of such queries are universally quantified queries and skyline queries. These queries, while important for data processing, are difficult to write and difficult to process. In this chapter we review research dealing with the specification and optimization of such queries. We present several approaches proposed to deal with universal quantification as well as with set predicates in general.
Chapter Preview

2. Motivation

Tuple-based conditions, both in RA and SQL, involve the value of one or more attributes in a given tuple, but they apply to one tuple at a time only. For instance, in the TPCH benchmark1, we can write a query to get the suppliers from Germany with a condition country_name=’GERMANY’ applied to each tuple in the join of SUPPLIER and NATION. To write more complex conditions involving several tuples, RA requires that we ’line up’ all the tuples by using joins. For instance, to get the suppliers that supply two parts, we need to see two tuples with the same supplier id and different parts ids. To achieve this, we join the relation SUPPLIES with itself and use renaming (say, giving SUPPLIES the alias S2), and then apply the condition SUPPLIES.supkey = S2.supkey and SUPPLIES.partkey <>S2.partkey2. Such cases, however, require a fixed amount of joins (one, in the example); they can be seen as conditions over a fixed number of tuples (two, in the example). There are queries, though, that require an indeterminate number of tuples. One example is the query select the orders where all suppliers are from Germany, in which we do not know in advance how many suppliers there are in each order, or how many suppliers are from Germany (and both sets can change with the data). These and similar examples (for instance, select the orders where at least half of the suppliers are from Germany) cannot be expressed with a regular SPJ query; they require set difference (i.e. negation) or other set operations. Queries containing such conditions are called set-oriented queries. The case of universal quantification (”all”) has been known for a long time as problematic for SQL, both to express and to optimize.

Set-oriented conditions come in three main types in SQL: one is a simple condition in a single set -for instance, existential quantification (checking whether the set is empty or not) is expressed with EXISTS or NOT EXISTS. An example would be a query like

Key Terms in this Chapter

Set-Oriented Condition: A condition that involves a variable number of tuples. The precise tuples involved in the condition may have to be determined by another query (in SQL, a subquery).

Subquery: in SQL, it is possible to write complex conditions in the WHERE clause of a SELECT statement by using a whole query in the condition. Such a query is called a nested query or a subquery , and it can be correlated (some of its attributes depend on values of the query in which it is embedded) or non-correlated.

Tuple-Oriented Condition: A condition that involves only a single tuple in a relation, or a fixed number of tuples. A fixed number of tuples can be converted, within the relational algebra, to a single tuple, by using a fixed number of joins.

Query Optimization: The process whereby a query written in SQL (and, in general, in a declarative query language) is transformed into a query plan that will actually produce the results of running the query. Under the assumption that more than one query plan can be produced for a given query, the query optimizer chooses the plan that it deems better performing.

Generalized Quantifier: An extension of the concept of quantifier in first-order logic, a Generalized Quantifier (GQ) can be seen as an operation on sets. As a consequence, GQs can be used to capture properties of sets that sometimes are not expressible in a logic, thereby increasing its expressive power. In query processing, GQs are used to denote a desired condition on sets in a declarative way, leaving the query processor free to decide how to optimize its computation.

Complete Chapter List

Search this Book: