Structure- and Content-Based Retrieval for XML Documents

Jae Woo Chang, Du-Seok Jin
As the number of XML documents is dramatically increasing, it is necessary to develop an XML document retrieval system that can support both structure-based retrieval and content-based retrieval. In order to support the structure-based retrieval, we design four efficient index structures, i.e., keyword, structure, element and attribute index, by indexing XML documents based on a basic element unit. In order to support the content-based retrieval, we design a high-dimensional index structure based on the X-tree so as to store and retrieve both color and shape feature vectors efficiently. Finally, we do the performance evaluation of our XML document retrieval system in terms of system efficiency, such as retrieval time, insertion time, and storage overhead, as well as system effectiveness, such as recall and precision measures.

