Composite objects represented as directed graphs are an important data structure that require efficient support Web and document databases (Abiteboul, Cluet, Christophides, Milo, Moerkotte, & Simon, 1997; Chen & Aberer, 1998, 1999; Mendelzon, Mihaila, & Milo, 1997; Zhang, Naughton, Dewitt, Luo, & Lohman, 2001), CAD/ CAM, CASE, office systems, and software management. It is cumbersome to handle such objects in relational database systems when they involve ancestor-descendant relations (or say, reachability relations). In this article, we present a new graph encoding based on a tree labeling method and the concept of branchings that are used in the graph theory for finding the shortest connection networks. A branching is a subgraph of a given digraph that is in fact a forest, but covers all the nodes of the graph. Concretely, for a DAG G (directed acyclic graph) of n nodes, the space needed for storing its transitive closure can be reduced to O(b·n), where b is the number of the leaf nodes of G’s branching. Such a compression is, however, at the expense of querying time. Theoretically, it takes O(logb) time to check whether a node is reachable from another. The method can also be extended to digraphs containing cycles.
A composite object can be generally represented as a directed graph (digraph for short). For example, in a CAD database, a composite object corresponds to a complex design, which is composed of several subdesigns. Often, subdesigns are shared by more than one higher-level design, and a set of design hierarchies thus forms a directed acyclic graph (DAG). As another example, the citation index of scientific literature, recording reference relationships between authors, constructs a directed cyclic graph. As a third example, we consider the traditional organization of a company, with a variable number of manager-subordinate levels, which can be represented as a tree hierarchy.
In a relational system, composite objects must be fragmented across many relations, requiring joins to gather all the parts. A typical approach to improving join efficiency is to equip relations with hidden pointer fields for coupling the tuples to be joined. The so-called join index is another auxiliary access path to mitigate this difficulty. Also, several advanced join algorithms have been suggested, based on hashing and a large main memory. In addition, a different kind of attempts to attain a compromise solution is to extend relational databases with new features, such as clustering of composite objects, by which the concatenated foreign keys of ancestor paths are stored in a primary key. Another extension to relational system is nested relations (or NF2 relations). Although it can be used to represent composite objects without sacrificing the relational theory, it suffers from the problem that subrelations cannot be shared. Moreover, recursive relationships cannot be represented by simple nesting because the depth is not fixed. Finally, deductive databases and object-relational databases can be considered as two quite different extensions to handle this problem (Chen, 2003; Ramakrishnan & Ullman, 1995).
In the past decade, another kind of research has been done to avoid join operation based on graph encoding. In this article, we provide an overview on most important techniques in this area and discuss a new encoding approach to pack “ancestor paths” in a relational environment (Chen, 2004; Chen & Cooke, 2006). It needs only O(e⋅b) time and O(n⋅b) space, where b is the number of the leaf nodes of the graph’s branching. This computational complexity is better than any existing method for this problem, including the graph-based algorithms (Bender, Farach-Colton, Pemmasani, Skiena, & Sumazin, 2004; Schmitz, 1983), the graph encoding (Abdeddaim, 1997; Bommel & Beck, 2000; Zibin & Gil, 2001), and the matrix-based algorithms (La Poutre & Leeuwen, 1988).
Key Terms in this Chapter
Cyclic Graph: A cyclic graph is a directed graph that contains at least one cycle.
Tree: A tree is a graph with a root, in which the indegree of each node is equal to 1.
Topological Order: A sequence S of nodes of a DAG G = ( V , E ): v 1 ,…, v n such that ( v i , v j ) ? E implies that v j appears before v i in S .