Mining Association Rules from XML Documents

Mining Association Rules from XML Documents

Laura Irina Rusu (La Trobe University, Australia), Wenny Rahayu (La Trobe University, Australia) and David Taniar (Monash University, Australia)
DOI: 10.4018/978-1-60566-330-2.ch011
OnDemand PDF Download:
No Current Special Offers


This chapter presents some of the existing mining techniques for extracting association rules out of XML documents in the context of rapid changes in the Web knowledge discovery area. The initiative of this study was driven by the fast emergence of XML (eXtensible Markup Language) as a standard language for representing semistructured data and as a new standard of exchanging information between different applications. The data exchanged as XML documents become richer and richer every day, so the necessity to not only store these large volumes of XML data for later use, but to mine them as well to discover interesting information has became obvious. The hidden knowledge can be used in various ways, for example, to decide on a business issue or to make predictions about future e-customer behaviour in a Web application. One type of knowledge that can be discovered in a collection of XML documents relates to association rules between parts of the document, and this chapter presents some of the top techniques for extracting them.
Chapter Preview


The starting point in developing algorithms and methodologies for mining XML documents was, naturally, the existing work done in the relational database mining area (Agrawal, Imielinski, & Swami, 1993; Agrawal & Srikant, 1998; Ashrafi, Taniar, & Smith, 2005; Ashrafi, 2004; Daly & Taniar, 2004; Tjioe & Taniar, 2005). In their attempt to apply various relational mining algorithms to the XML documents, researchers discovered that the approach could be a useful solution for mining small and not very complex XML documents, but not an efficient approach for mining large and complex documents with many levels of nesting.

The XML format comes with the acclaimed extensibility that allows the change of structure, that is, adding, removing, and renaming nodes in the document according to the information necessary to be encoded in. Furthermore, using the XML representation, there are a lot of possibilities to express the same information (see Figure 1 for an example) not only between different XML documents, but inside the same document as well (Rusu, Rahayu, & Taniar, 2005a).

Figure 1.

Different formats to express the same information using the XML structure


Complete Chapter List

Search this Book: