Mining XML Documents

Mining XML Documents

Laurent Candillier (Université Charles de Gaulle, France), Ludovic Denoyer (Université Pierre et Marie Curie, France), Patrick Gallinari (Université Pierre et Marie Curie, France), Marie Christine Rousset (LSR-IMAG, France), Alexandre Termier (Institute of Statistical Mathematics, Japan) and Anne-Marie Vercoustre (INRIA, France)
Copyright: © 2008 |Pages: 22
DOI: 10.4018/978-1-59904-162-9.ch009
OnDemand PDF Download:


XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the increasing size of XML collections as information sources, mining techniques that traditionally exist for text collections or databases need to be adapted and new methods to be invented to exploit the particular structure of XML documents. Basically XML documents can be seen as trees, which are well known to be complex structures. This chapter describes various ways of using and simplifying this tree structure to model documents and support efficient mining algorithms. We focus on three mining tasks: classification and clustering which are standard for text collections; discovering of frequent tree structure which is especially important for heterogeneous collection. This chapter presents some recent approaches and algorithms to support these tasks together with experimental evaluation on a variety of large XML collections.

Complete Chapter List

Search this Book:
Table of Contents
Pascal Poncelet, Maguelonne Teisseire, Florent Masseglia
Chapter 1
Dan A. Simovici
This chapter presents data mining techniques that make use of metrics defined on the set of partitions of finite sets. Partitions are naturally... Sample PDF
Metric Methods in Data Mining
Chapter 2
Osmar R. Zaïane, Mohammed El-Hajj
Frequent Itemset Mining (FIM) is a key component of many algorithms that extract patterns from transactional databases. For example, FIM can be... Sample PDF
Bi-Directional Constraint Pushing in Frequent Pattern Mining
Chapter 3
Hui Xiong, Pang-Ning Tan, Vipin Kumar, Wenjun Zhou
This chapter presents a framework for mining highly-correlated association patterns named hyperclique patterns. In this framework, an objective... Sample PDF
Mining Hyperclique Patterns: A Summary of Results
Chapter 4
Simona Este Rombo, Luigi Palopoli
In the last years, the information stored in biological data-sets grew up exponentially, and new methods and tools have been proposed to interpret... Sample PDF
Pattern Discovery in Biosequences: From Simple to Complex
Chapter 5
Gregor Leban, Minca Mramor, Blaž Zupan, Janez Demšar, Ivan Bratko
Data visualization plays a crucial role in data mining and knowledge discovery. Its use is, however, often difficult due to the large number of... Sample PDF
Finding Patterns in Class-Labeled Data Using Data Visualization
Chapter 6
Yeow Wei Choong, Anne Laurent, Dominique Laurent
In the context of multidimensional data, OLAP tools are appropriate for the navigation in the data, aiming at discovering pertinent and abstract... Sample PDF
Summarizing Data Cubes Using Blocks
Chapter 7
Yutaka Matsuo, Junichiro Mori, Mitsuru Ishizuka
This chapter describes social network mining from the Web. Since the end of the 1990s, several attempts have been made to mine social network... Sample PDF
Social Network Mining from the Web
Chapter 8
Donato Malerba, Margherita Berardi, Michelangelo Ceci
This chapter introduces a data mining method for the discovery of association rules from images of scanned paper documents. It argues that a... Sample PDF
Discovering Spatio-Textual Association Rules in Document Images
Chapter 9
Mining XML Documents  (pages 198-219)
Laurent Candillier, Ludovic Denoyer, Patrick Gallinari, Marie Christine Rousset, Alexandre Termier, Anne-Marie Vercoustre
XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the... Sample PDF
Mining XML Documents
Chapter 10
Sascha Schulz, Myra Spiliopoulou, Rene Schult
We study the issue of discovering and tracing thematic topics in a stream of documents. This issue, often studied under the label “topic evolution”... Sample PDF
Topic and Cluster Evolution Over Noisy Document Streams
Chapter 11
Cyrille J. Joutard, Edoardo M. Airoldi, Stephen E. Edoardo M., Tanzy M. Love
Statistical models involving a latent structure often support clustering, classification, and other data-mining tasks. Parameterizations... Sample PDF
Discovery of Latent Patterns with Hierarchical Bayesian Mixed-Membership Models and the Issue of Model Choice
About the Editors
About the Contributors