Topic and Cluster Evolution Over Noisy Document Streams

Topic and Cluster Evolution Over Noisy Document Streams

Sascha Schulz (Humboldt-University Berlin, Germany), Myra Spiliopoulou (Otto-von-Guericke-University Magdeburg, Germany) and Rene Schult (Otto-von-Guericke-University Magdeburg, Germany)
Copyright: © 2008 |Pages: 20
DOI: 10.4018/978-1-59904-162-9.ch010
OnDemand PDF Download:


We study the issue of discovering and tracing thematic topics in a stream of documents. This issue, often studied under the label “topic evolution” is of interest in many applications where thematic trends should be identified and monitored, including environmental modelling for marketing and strategic management applications, information filtering over streams of news and enrichment of classification schemes with emerging new classes. We concentrate on the latter area and depict an example application from the automotive industry – the discovery of emerging topics in repair & maintenance reports. We first discuss relevant literature on (a) the discovery and monitoring of topics over document streams and (b) the monitoring of evolving clusters over arbitrary data streams. Then, we propose our own method for topic evolution over a stream of small noisy documents: We combine hierarchical clustering, performed at different time periods, with cluster comparison over adjacent time periods, taking into account that the feature space itself may change from one period to the next. We elaborate on the behaviour of this method and show how human experts can be assisted in identifying class candidates among the topics thus identified.

Complete Chapter List

Search this Book:
Table of Contents
Pascal Poncelet, Maguelonne Teisseire, Florent Masseglia
Chapter 1
Dan A. Simovici
This chapter presents data mining techniques that make use of metrics defined on the set of partitions of finite sets. Partitions are naturally... Sample PDF
Metric Methods in Data Mining
Chapter 2
Osmar R. Zaïane, Mohammed El-Hajj
Frequent Itemset Mining (FIM) is a key component of many algorithms that extract patterns from transactional databases. For example, FIM can be... Sample PDF
Bi-Directional Constraint Pushing in Frequent Pattern Mining
Chapter 3
Hui Xiong, Pang-Ning Tan, Vipin Kumar, Wenjun Zhou
This chapter presents a framework for mining highly-correlated association patterns named hyperclique patterns. In this framework, an objective... Sample PDF
Mining Hyperclique Patterns: A Summary of Results
Chapter 4
Simona Este Rombo, Luigi Palopoli
In the last years, the information stored in biological data-sets grew up exponentially, and new methods and tools have been proposed to interpret... Sample PDF
Pattern Discovery in Biosequences: From Simple to Complex
Chapter 5
Gregor Leban, Minca Mramor, Blaž Zupan, Janez Demšar, Ivan Bratko
Data visualization plays a crucial role in data mining and knowledge discovery. Its use is, however, often difficult due to the large number of... Sample PDF
Finding Patterns in Class-Labeled Data Using Data Visualization
Chapter 6
Yeow Wei Choong, Anne Laurent, Dominique Laurent
In the context of multidimensional data, OLAP tools are appropriate for the navigation in the data, aiming at discovering pertinent and abstract... Sample PDF
Summarizing Data Cubes Using Blocks
Chapter 7
Yutaka Matsuo, Junichiro Mori, Mitsuru Ishizuka
This chapter describes social network mining from the Web. Since the end of the 1990s, several attempts have been made to mine social network... Sample PDF
Social Network Mining from the Web
Chapter 8
Donato Malerba, Margherita Berardi, Michelangelo Ceci
This chapter introduces a data mining method for the discovery of association rules from images of scanned paper documents. It argues that a... Sample PDF
Discovering Spatio-Textual Association Rules in Document Images
Chapter 9
Mining XML Documents  (pages 198-219)
Laurent Candillier, Ludovic Denoyer, Patrick Gallinari, Marie Christine Rousset, Alexandre Termier, Anne-Marie Vercoustre
XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the... Sample PDF
Mining XML Documents
Chapter 10
Sascha Schulz, Myra Spiliopoulou, Rene Schult
We study the issue of discovering and tracing thematic topics in a stream of documents. This issue, often studied under the label “topic evolution”... Sample PDF
Topic and Cluster Evolution Over Noisy Document Streams
Chapter 11
Cyrille J. Joutard, Edoardo M. Airoldi, Stephen E. Edoardo M., Tanzy M. Love
Statistical models involving a latent structure often support clustering, classification, and other data-mining tasks. Parameterizations... Sample PDF
Discovery of Latent Patterns with Hierarchical Bayesian Mixed-Membership Models and the Issue of Model Choice
About the Editors
About the Contributors