A Fast and Space-Economical Algorithm for the Tree Inclusion Problem

A Fast and Space-Economical Algorithm for the Tree Inclusion Problem

Yangjun Chen (University of Winnipeg, Canada) and Yibin Chen (University of Winnipeg, Canada)
DOI: 10.4018/978-1-5225-7659-4.ch021

Abstract

The ordered tree inclusion is an interesting problem, by which the authors will check whether a pattern tree P can be included in a target tree T, where the order of siblings in both P and T is significant. In this chapter, the authors propose an efficient algorithm for this problem. Its time complexity is bounded by O(|T|⋅loghP) with O(|T| + |P|) space being used, where hP represents the height of P. Up to now the best algorithm for this problem needs Θ(|T|⋅|leaves(P)|) time, where leaves(P) stands for the set of the leaves of P.
Chapter Preview
Top

Introduction

Let T be a rooted tree. We say that T is ordered and labeled if each node is assigned a symbol from an alphabet Σ and a left-to-right order among siblings in T is specified. Let v be a node different of the root in T with parent node u. Denote by delete(T, v) the tree obtained by removing the node v from T, by which the children of v become part of the children of u as illustrated in Figure 1.

Given two ordered labeled trees P and T, called the pattern and the target, respectively. We may ask: Can we obtain pattern P by deleting some nodes from target T? That is, is there a sequence v1, ...,vk of nodes such that forT0 = T andTi+1 = delete(Ti, vi+1) for i = 0, ..., k - 1,we have Tk = P? If this is the case, we say, P is included in T (Kilpeläinen and Mannila, 1995). Such a problem is called the tree inclusion problem. It has many applications in the computer engineering as described below.

Figure 1.

Illustration of node deletion

978-1-5225-7659-4.ch021.f01
Top

Background

The first interesting application of the tree inclusion is used as an important query primitive for XML data (Mannila and Räiha, 1990), where a structured document database is considered as a collection of parse trees that represent the structure of the stored texts and the tree inclusion is used as a means of retrieving information from them. As an example, consider the tree shown in Figure 2, representing an XML document for the book Arts of Programming authored by (Knuth, 1969). One might want to find this book in an XML database by forming a pattern tree as shown in Figure 3 as a query, which can be obtained by deleting some nodes from the tree shown in Figure 2. Thus, a tree inclusion checking needs to be conducted to evaluate this query.

Figure 2.

A XML document (target) tree

978-1-5225-7659-4.ch021.f02
Figure 3.

A pattern tree

978-1-5225-7659-4.ch021.f03

Another application of this problem is to query the grammatical structures of English sentences, which can also be represented as an ordered tree since a sentence can always be divided into several ordered components such as noun phases, verb phrases, and adverbs; and a noun phrase itself normally contains an article and a noun while a verb phrase may contain a verb, a noun phrase, an adverb, and so on. To check whether a concrete sentence is grammatically correct, we will represent it as a pattern tree and make a tree inclusion checking against some target grammatical tree structures.

Complete Chapter List

Search this Book:
Reset