XTEngine: A Twin Search Engine for XML

XTEngine: A Twin Search Engine for XML

Kamal Taha (The University of Texas at Arlington, USA)
DOI: 10.4018/978-1-60960-521-6.ch009


There has been extensive research in XML Keyword-based and Loosely Structured querying. Some frameworks work well for certain types of XML data models while fail in others. The reason is that the proposed techniques overlook the context of elements when building relationships between the elements. The context of a data element is determined by its parent, because a data element is generally a characteristic of its parent. Overlooking the contexts of elements may result in relationships between the elements that are semantically disconnected, which lead to erroneous results. We present in this chapter a context-driven search engine called XTEngine for answering XML Keyword-based and Loosely Structured queries. XTEngine treats each set of elements consisting of a parent and its children data elements as one unified entity, and then uses context-driven search techniques for determining the relationships between the different unified entities. We evaluated XTEngine experimentally and compared it with three other search engines. The results showed marked improvement.
Chapter Preview


Extensive research has been done in keyword querying using relational data (Agrawal & Chaudhuri & Das, 2002; Aditya & Bhalotia & Sudarshan, 2002; Hristidis & Papakonstantinou, 2002). Research in XML querying has significant boost with the emergence of World Wide Web, online businesses, and the concept of ubiquitous computing. Some of these works model XML data as a rooted tree (Liu & Chen, 2007; Xu & Papakonstantinou, 2005; Li & Yu & Jagadish, 2004; Cohen & Mamou & Sagiv, 2003). Others, model it as a graph (Cohen & Kanza, 2005; Balmin & Hristidis & Papakonstantinon, 2003; Balmin & Hristidis & Papakonstantinon, 2004; Botev & Shao & Guo, 2003). Most of these works target either: (1) naïve users (such as business’ customers) by proposing Keyword-based search engines, or (2) sophisticated users, by proposing fully structured search engines.

Business’ customers are most likely not aware of the exact structure of the underlying data. On the other hand, business’ employees are likely to be aware of some labels (or attributes) of elements containing data, but they are unlikely to be fully aware of the underlying data structure. Thus, business’ customers need a pure Keyword-based search engine, while business’ employees need a Loosely Structured search engine for answering their queries. A Loosely Structured query combines keywords and element names. We propose in this chapter: (1) an XML Keyword-based search engine called XTEngine-K for answering business’ customers, and (2) an XML Loosely Structured search engine called XTEngine-L for answering business’ employees. Consider that the user wants to know the data D, which is contained in an element labeled E. If the user knows only the keywords k1, k2, ..., kn, which are relevant to D, he/she can submit a Keyword-based query to XTEngine-K in the form: Q (“k1”, “k2”, ..., “kn”). If, however, the user knows the label E and the labels (which are the labels of the elements containing the keywords k1, k2, .., kn respectively), but this user is unaware of the structure of the data, he/she can submit a Loosely Structured query to XTEngine-L in the form: Q (= “k1”, …, = “kn“, E?). XTEngine is built on top of XQuery search engine (Katz, 2005).

Complete Chapter List

Search this Book: