Since XML (eXtensible Markup Language) (Bray, Paoli, Sperberg-McQueen, Maler & Yergeau, 2004) emerged as a standard for information representation and exchange, storing, indexing, and querying, XML documents have become major issues in database research. Query processing and optimization are very important in this context, and indices are data structures that help enhance performances substantially. Though XML indexing concepts are mainly inherited from relational databases, XML indices bear numerous specificities. The aim of this chapter is to present an overview of state-of-the-art XML indices and to discuss the main issues, trade-offs, and future trends in XML indexing. Furthermore, since XML is gaining importance for representing business data for analytics (Beyer, Chamberlin, Colby, Özcan, Pirahesh & Xu, 2005), we also present an index we developed specifically for XML data warehouses.
Indexing and querying XML documents through path expressions expressed in XPath (Clark & DeRose, 1999) and XQuery (Boag, Chamberlin, Fernandez, Florescu, Robie & Siméon, 2006) have been the focus of many research studies. Two families of approaches aim at efficiently processing path join queries. They are based on structural summaries and numbering schemes, respectively.
Key Terms in this Chapter
Structural Summary-Based Index: Labeled-graph structure that summarizes XML graph structural information. XML
XML-Native DBMS (NXD): Database system in which XML data are natively stored and queried as XML documents. An NXD provides XML schema storage and implements an XML query engine (typically supporting XPath and XQuery). eXist (Meier, 2002) and X-Hive (X-Hive Corporation, 2007) are examples of NXDs.
Database Management System (DBMS): Software set that handles structuring, storage, maintenance, update, and querying of data stored in a database.
Data Warehouse: XML database specifically modeled (i.e., multidimensionally with a starlike schema) to support XML decision-support and analytic queries.
Numbering Scheme-Based Index.: Tree structure in which each XML data node is uniquely identified by an interval.
Index: Physical data structure that allows direct (vs. sequential) access to data and thereby considerably improves data access time.
XML-Enabled DBMS: Database system in which XML data may be stored and queried from relational tables. Such a DBMS must either map XML data into relations and translate queries into SQL or implement a middleware layer allowing native XML storing and querying.