Reference Hub1
Topic and Cluster Evolution Over Noisy Document Streams

Topic and Cluster Evolution Over Noisy Document Streams

Sascha Schulz, Myra Spiliopoulou, Rene Schult
Copyright: © 2008 |Pages: 20
ISBN13: 9781599041629|ISBN10: 1599041626|ISBN13 Softcover: 9781616926519|EISBN13: 9781599041643
DOI: 10.4018/978-1-59904-162-9.ch010
Cite Chapter Cite Chapter

MLA

Schulz, Sascha, et al. "Topic and Cluster Evolution Over Noisy Document Streams." Data Mining Patterns: New Methods and Applications, edited by Pascal Poncelet, et al., IGI Global, 2008, pp. 220-239. https://doi.org/10.4018/978-1-59904-162-9.ch010

APA

Schulz, S., Spiliopoulou, M., & Schult, R. (2008). Topic and Cluster Evolution Over Noisy Document Streams. In P. Poncelet, F. Masseglia, & M. Teisseire (Eds.), Data Mining Patterns: New Methods and Applications (pp. 220-239). IGI Global. https://doi.org/10.4018/978-1-59904-162-9.ch010

Chicago

Schulz, Sascha, Myra Spiliopoulou, and Rene Schult. "Topic and Cluster Evolution Over Noisy Document Streams." In Data Mining Patterns: New Methods and Applications, edited by Pascal Poncelet, Florent Masseglia, and Maguelonne Teisseire, 220-239. Hershey, PA: IGI Global, 2008. https://doi.org/10.4018/978-1-59904-162-9.ch010

Export Reference

Mendeley
Favorite

Abstract

We study the issue of discovering and tracing thematic topics in a stream of documents. This issue, often studied under the label “topic evolution” is of interest in many applications where thematic trends should be identified and monitored, including environmental modelling for marketing and strategic management applications, information filtering over streams of news and enrichment of classification schemes with emerging new classes. We concentrate on the latter area and depict an example application from the automotive industry – the discovery of emerging topics in repair & maintenance reports. We first discuss relevant literature on (a) the discovery and monitoring of topics over document streams and (b) the monitoring of evolving clusters over arbitrary data streams. Then, we propose our own method for topic evolution over a stream of small noisy documents: We combine hierarchical clustering, performed at different time periods, with cluster comparison over adjacent time periods, taking into account that the feature space itself may change from one period to the next. We elaborate on the behaviour of this method and show how human experts can be assisted in identifying class candidates among the topics thus identified.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.