Conceptual Clustering of Textual Documents and Some Insights for Knowledge Discovery
Leandro Krug Wives (Federal University of Rio Grande do Sul, Brazil), José Palazzo Moreira de Oliveira (Federal University of Rio Grande do Sul, Brazil) and Stanley Loh (Catholic University of Pelotas, Brazil and Lutheran University of Brazil, Brazil)
Copyright: © 2008
This chapter introduces a technique to cluster textual documents using concepts. Document clustering is a technique capable of organizing large amounts of documents in clusters of related information, which helps the localization of relevant information. Traditional document clustering techniques use words to represent the contents of the documents and the use of words may cause semantic mistakes. Concepts, instead, represent real world events and objects, and people employ them to express ideas, thoughts, opinions and intentions. Thus, concepts are more appropriate to represent the contents of a document and its use helps the comprehension of large document collections, since it is possible to summarize each cluster and rapidly identify its contents (i.e. concepts). To perform this task, the chapter presents a methodology to cluster documents using concepts and presents some practical experiments in a case study to demonstrate that the proposed approach achieves better results than the use of words.