In this chapter, the research background is discussed. This includes XML model, XML query languages, XML schema languages, XML Application Program Interface, XML documents types, XML data storage approaches, relational database model, and the similarities and differences between XML model and relational database model. Finally the chapter summary is given.
TopXml Model
“EXtensible Markup Language (XML), is a W3C Recommendation in 1998 for marking up data” (Bray et al., 2007). It is designed for publishing and exchanging a large scale of digital data over the Internet. It is a Markup language that is used to define the structure of information and its elements’ contents, where HTML is used to define the way in which the elements are displayed on a web page. It can also be considered as an ideal format for server-to-server transfer of structured data (Bansal and Alam, 2001).
The importance of XML documents transformation is largely increased. Moreover different XML models have common requirements and limitations as tools for data management. For rich data to be shared among different groups, all concepts need to be placed into a common frame of reference. XML schemas must be globally standardized among groups, or mapping must be created between all pairs of related data. Parsing and text conversion slows down the access of the data.
A well-formed XML document is one that corresponds to the XML 1.0 (Bray et al., 2007) grammar specified by W3C. It has exactly one root element, which is called document element. Each starting element tag should have a corresponding closing tag. The elements should be nested within one another. The tags and nesting rules allow XML to represent information in a hierarchical manner. Figure 1 shows an example for valid XML document.
In recent years, significant development in the XML domain has been achieved. Many languages based upon XML Markup have been designed; XML Schema and XML XQuery have been developed. These standardized technologies augment the data processing abilities of XML. The following sections give a brief description of a variety of XML based languages and technologies.
Figure 1. An example of XML document
TopXml Query Languages
XML query languages are used to enable the user to retrieve data from a single XML document using XPath language, or from multi-documents using XQuery language.
XPath LanguageXPath stands for the XML Path Language(Berglund et al., 2007). It is used for retrieving parts of a single XML document by using a path notation, like those used in URLs. Every XPath expression evaluates to one of four basic types:
- •
Node-set (An unordered list of nodes)
- •
Boolean
- •
Number (floating-point number)
- •
String (a sequence of UCS characters)
An XPath location can be either a relative or an absolute location in an XML document. It can deal with seven node types:
The amount of nodes matched by an XPath location can be restricted further by specifying additional requirements for a match like comparison operators, functions or predefined variables. XPath supports equality operators and helper functions operating on the four basic types (i.e. node-set, Boolean, number and string), for instance substring extraction, summation of the values in a node-set or the number of nodes in a node-set to name a few. Table 1 shows an example of some XPath expressions to retrieve data from the XML document in Figure 1.
Table 1. Example of some XPath expressions
./author | All <author> elements within the current context. Note that this is equivalent to the expression in the next row. |
author | All <author> elements within the current context. |
/books | The document element (<books>) of this document. |
//author | All <author> elements in the document. |
book/ISBN | All <ISBN> elements that are children of a <book> element. |
books//name | All <name> elements one or more levels deep in the <books> element (arbitrary descendants). Note that this is different from the expression in the next row. |
books/*/name | All <name> elements that are grandchildren of <books> elements. |
author[1] | The first <author> element in the current context node. |
book/* | All elements that are the children of <book> elements. |
book[@price < “60.0”] | All <book> elements where price attribute is less than “60.0”. |
ancestor::name[parent::book][1] | The nearest <name> ancestor in the current context and this <name> element is a child of a <book> element. |