Research Background

Research Background

Badya Al-Hamadani (University of Huddersfield, UK) and Joan Lu (University of Huddersfield, UK)
DOI: 10.4018/978-1-4666-1975-3.ch008
OnDemand PDF Download:
No Current Special Offers


XML documents are increasing in usage during the last years since the structure of these documents has lots of important specifications. This chapter explains the importance of XML documents and their structure. It spotlights the difference between the traditional text retrieval and retrieving information from XML documents as well as the query languages used to access parts of these documents.
Chapter Preview

1.1 Xml Commencements And Importance

Before the rise of the internet, 1980s witnessed the invention of Standard Generalized Markup Language (SGML) as a way to display information dynamically. Later, in 1995, W3C recommended SGML to be used for the internet. Problems occurred when using SGML included the lack of widely supported style sheets, complexity and instability in the software that were using it, and the difficulties in interchanging SGML data due to its varying levels among SGML software packages.

In 1996, the first XML working draft was intended to be a powerful substitute to SGML. It was first recommended by the World Wide Web Consortium (W3C) in 1998 to be used as a mark-up language for storing and exchanging data through the web. The most recent recommendation was published in 2008, which is the fifth edition of the XML (W3C, 2008). In a very short period of time, XML has become the basis for data exchange through the Internet. This is due to its several features such as the following (NG et al., 2006; Gerlicher, 2007; Groppe, 2008):

  • Readability: XML is readable by both human and machine. This means that the data represented by XML can be used by different users and by different parsing code.

  • Interoperability: This is the ability of the hardware and software to use XML documents without the need to make any changes to the software or the data itself. This means that XML data is stripped of any dependency on software and machine.

  • Long term usability: Since XML documents are represented using the Unicode; these documents are expected to stay in secure storage and usage for years (Augeri et al., 2007; De Meo et al., 2007) .

  • Extensibility: This means that there are no fixed set of tags that should be used to represent data.

  • Generality: XML documents have the ability to represent different kinds of data representation such as images, sounds, videos, texts, etc.

  • Internationality: Almost all written languages can be represented in XML documents since they support Unicode (Norbert and Kai, 2004).

In spite of all these advantages, XML has also some weaknesses:

  • They have a huge amount of redundancy which makes these documents demand high storage memory to be archives, high band width to be transmitted, and high cost to be processed.

  • The huge amount of technologies surrounding it complicates the use of these documents such as schema, DTD, XSLT, SAX, DOM, XPath, XQuery. These technologies render the use of these documents somewhat difficult especially with naive users or in cases where these technologies are absent, it would be just as difficult as they are considered necessary for dealing with XML documents.

  • The problems that can occur when dealing with the document namespace should be carefully sorted out otherwise other problems and complications could occur during the processing of these XML documents.


1.2 Xml Document Types

The main building blocks of any well-formed XML document are nested open tags and their equivalent close tags. These tags can be formed as follows (Hunter, 2000; Anders, 2009; Goldberg, 2009):

  • 1.

    Elements: each element starts with an open tag (<p>) and ends with an end tag (</p>). Everything between and including these tags are an element. The general structure of an element is as follows:<e at1=”v1” at2=”v2” atn=”vn”>d1d2d3…dm</e>

  • 2.

    Such thatn≥0, and m≥0 (1)

Complete Chapter List

Search this Book: