# A Fast and Space-Economical Algorithm for the Tree Inclusion Problem

Yangjun Chen (University of Winnipeg, Canada) and Yibin Chen (University of Winnipeg, Canada)
DOI: 10.4018/978-1-5225-2255-3.ch391

Top

## Introduction

Let T be a rooted tree. We say that T is ordered and labeled if each node is assigned a symbol from an alphabet Σ and a left-to-right order among siblings in T is specified. Let v be a node different of the root in T with parent node u. Denote by delete(T, v) the tree obtained by removing the node v from T, by which the children of v become part of the children of u as illustrated in Figure 1.

Given two ordered labeled trees P and T, called the pattern and the target, respectively. We may ask: Can we obtain pattern P by deleting some nodes from target T? That is, is there a sequence v1, ...,vk of nodes such that forT0 = T andTi+1 = delete(Ti, vi+1) for i = 0, ..., k - 1,we have Tk = P? If this is the case, we say, P is included in T (Kilpeläinen and Mannila, 1995). Such a problem is called the tree inclusion problem. It has many applications in the computer engineering as described below.

Figure 1.

Illustration of node deletion

Top

## Background

The first interesting application of the tree inclusion is used as an important query primitive for XML data (Mannila and Räiha, 1990), where a structured document database is considered as a collection of parse trees that represent the structure of the stored texts and the tree inclusion is used as a means of retrieving information from them. As an example, consider the tree shown in Figure 2, representing an XML document for the book Arts of Programming authored by (Knuth, 1969). One might want to find this book in an XML database by forming a pattern tree as shown in Figure 3 as a query, which can be obtained by deleting some nodes from the tree shown in Figure 2. Thus, a tree inclusion checking needs to be conducted to evaluate this query.

Figure 2.

A XML document (target) tree

Figure 3.

A pattern tree

Another application of this problem is to query the grammatical structures of English sentences, which can also be represented as an ordered tree since a sentence can always be divided into several ordered components such as noun phases, verb phrases, and adverbs; and a noun phrase itself normally contains an article and a noun while a verb phrase may contain a verb, a noun phrase, an adverb, and so on. To check whether a concrete sentence is grammatically correct, we will represent it as a pattern tree and make a tree inclusion checking against some target grammatical tree structures.

## Key Terms in this Chapter

Unordered Tree Inclusion: A tree inclusion, by which the ordering of siblings is not important.

Interval: A pair of integers [a, b] where b denotes the rank of v in a post-order traversal of a trie. Here the ranks are assumed to begin with 1, and all the children of a node are assumed to be ordered and fixed during the traversal. In addition, a denotes the lowest rank for any node u in the subtree rooted at v. Its purpose is to check the reachability. Let [a, b] an [c, d] be the intervals associated with v and u, respectively. If [c, d] ? [a, b], then u is reachable from v through a path in the tree.

Ordered Tree Inclusion: Mapping a pattern tree into a target tree, by which the labels and ancestor/descendant relationship, as well as left-to-right ordering of nodes should be preserved.

Top-Down Tree Search: A systematic way to explore a tree, by which a node will be visited before its children are visited.

Bottom-Up Tree Search: A systematic way to explore a tree, by which a node will be visited after its children are visited.

Twig Matching: A kind of relaxed unordered tree inclusion, by which more than one node in a pattern can be mapped to a single node in a target as long as labels and ancestor/descendant relationship of nodes are preserved.

Tree Matching: Mapping a pattern tree into target tree, by which the labels and parent/child relationship of nodes should be preserved.

## Complete Chapter List

Search this Book:
Reset