A Study of XML Models for Data Mining: Representations, Methods, and Issues

A Study of XML Models for Data Mining: Representations, Methods, and Issues

Sangeetha Kutty (Queensland University of Technology, Australia), Richi Nayak (Queensland University of Technology, Australia) and Tien Tran (Queensland University of Technology, Australia)
Copyright: © 2013 |Pages: 27
DOI: 10.4018/978-1-4666-2455-9.ch001


With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques can be used to derive this interesting information. However, mining of XML documents is impacted by the data model used in data representation due to the semi-structured nature of these documents. In this chapter, we present an overview of the various models of XML documents representations, how these models are used for mining, and some of the issues and challenges inherent in these models. In addition, this chapter also provides some insights into the future data models of XML documents for effectively capturing its two important features, structure and content, for mining.
Chapter Preview

Data Models For Xml Document Mining

To suit the objectives and the needs of XML mining algorithms, XML data has been represented in various forms. Figure 1 gives taxonomy of XML data showing various data models that facilitate XML mining with different features that exist in the XML data.

Figure 1.

Data models facilitating mining of XML data

There are two types of XML data: XML document and XML schema definition. An XML schema definition contains the structure and data definitions of XML documents (Abiteboul, Buneman, & Suciu, 2000). An XML document, on the other hand, is an instance of the XML schema that contains the data content represented in a structured format.

The provision of the XML schema definition with XML documents makes it different from the other types of semi-structured data such as HTML and BibTeX. The schema imposes restrictions on the syntax and structure of XML documents. The two most popular XML document schema languages are Document Type Definition (DTD) and XML-Schema Definition (XSD). Figures 2 and 3 show DTD and XSD examples respectively. An example of a simple XML document conforming to the schemas from Figure 2 and 3 is shown in Figure 4.

Figure 2.

An example of a DTD schema (conf.dtd)

Figure 3.

An example of an XSD schema (conf.xsd)

Complete Chapter List

Search this Book: