Without any doubt, the eXtensible Markup Language (XML) (Bray et al., 2006) is currently one of the most popular formats for data representation. Its wide popularity naturally invoked an enormous endeavour to propose faster and more efficient methods and tools for managing and processing of XML data. Soon it was possible to distinguish several different directions. The four most popular ones are methods which store XML data in a classical file system, methods which store and process XML data using a relational database management system, methods which exploit a pure object-oriented approach and native methods that use special indices, numbering schemas and/or data structures proposed or suitable particularly for tree structure of XML data.
The main concern of the database-based XML techniques is the choice of the way XML data are stored into relations, so-called XML-to-relational mapping or schema decomposition to relations. The strategy in first approaches to XML-to-relational mapping, so-called generic (e.g. Florescu et al., 1999), was based purely on the data model of XML documents. The methods were able to store any kind of XML data since they viewed XML documents as general labelled trees. But, the efficiency of query evaluation was quite low due to numerous join operations or the increase of efficiency was gained at the cost of increase of space overhead.
Hereafter, the scientists came with a natural idea to exploit structural information extracted from XML schemas of XML data, usually expressed in DTD (Document Type Description) (Bray et al., 2006) or XML Schema (Thompson et al., 2004; Biron et al., 2004) language. All the so-called schema-driven approaches (e.g. Shanmugasundaram et al., 1999) were based on the same idea that the structure of the target relational schema can be created according the structure of the source XML schema. Assuming that a user specifies the XML schema as precisely as possible to specify the related data, we can get also more precise relational XML schema. The problem is that DTDs are usually too general. The extensive examples are recursion or * operator which, in general, enable to specify infinitely deep or wide XML documents. According to analyses of real-world XML data (Mlynkova et al., 2006) in both the cases the respective XML documents are much simpler and, thus, the effort spent on processing all the complex schema constructs is useless.
Key Terms in this Chapter
Adaptive XML-to-Relational Mapping: An XML-to-relational mapping which is based on exploitation of additional information, such as, e.g., sample XML documents, sample XML queries, user-specified requirements etc. and respectively adapts the target relational schema.
Schema-Driven XML-to-Relational Mapping: An XML-to-relational mapping where the target relational schema is defined according to the structure of the source XML schema of XML data.
User-Driven XML-to-Relational Mapping: An XML-to-relational mapping where the user is provided with a fixed mapping strategy and can specify its local changes where appropriate.
Cost-Driven XML-to-Relational Mapping: An XML-to-relational mapping which searches a space of possible fixed XML-to-relational mapping methods and chooses the one which conforms to the current application, i.e. XML documents and XML queries, the most.
Fixed XML-to-Relational Mapping: An XML-to-relational mapping which is based on a fixed set of rules how to create the target database schema.
Generic XML-to-Relational Mapping: An XML-to-relational mapping which is based purely on a selected kind of data model of XML documents.
XML-to-Relational Mapping: A method which specifies how XML data are stored into relations of a relational database management system. It involves a definition of the target database schema and a way how the data are stored into its relations. Related problems are data retrieval and data updates, but they are usually determined directly by the storage strategy.
User-Defined XML-to-Relational Mapping: An XML-to-relational mapping where the user specifies both the target relational schema and the required mapping manually.