Conceptual Clustering of Textual Documents and Some Insights for Knowledge Discovery

Conceptual Clustering of Textual Documents and Some Insights for Knowledge Discovery

Leandro Krug Wives (Federal University of Rio Grande do Sul, Brazil), José Palazzo Moreira de Oliveira (Federal University of Rio Grande do Sul, Brazil) and Stanley Loh (Catholic University of Pelotas, Brazil and Lutheran University of Brazil, Brazil)
Copyright: © 2008 |Pages: 21
DOI: 10.4018/978-1-59904-373-9.ch011
OnDemand PDF Download:


This chapter introduces a technique to cluster textual documents using concepts. Document clustering is a technique capable of organizing large amounts of documents in clusters of related information, which helps the localization of relevant information. Traditional document clustering techniques use words to represent the contents of the documents and the use of words may cause semantic mistakes. Concepts, instead, represent real world events and objects, and people employ them to express ideas, thoughts, opinions and intentions. Thus, concepts are more appropriate to represent the contents of a document and its use helps the comprehension of large document collections, since it is possible to summarize each cluster and rapidly identify its contents (i.e. concepts). To perform this task, the chapter presents a methodology to cluster documents using concepts and presents some practical experiments in a case study to demonstrate that the proposed approach achieves better results than the use of words.

Complete Chapter List

Search this Book:
Table of Contents
Cláudio Chauke Nehme
Hercules Antonio do Prado, Edilson Ferneda
Hercules Antonio do Prado, Edilson Ferneda
Chapter 1
Jie Tang, Mingcai Hong, Duo Liang Zhang, Juanzi Li
This chapter is concerned with the methodologies and applications of information extraction. Information is hidden in the large volume of web pages... Sample PDF
Information Extraction: Methodologies and Applications
Chapter 2
Roberto Penteado, Eric Boutin
The information overload demands that organizations set up new capabilities concerning the analysis of data and texts to create the necessary... Sample PDF
Creating Strategic Information for Oranizations with Structured Text
Chapter 3
Christian Aranha, Emmanuel Passos
This chapter integrates elements from Natural Language Processing, Information Retrieval, Data Mining and Text Mining to support competitive... Sample PDF
Automatic NLP for Competitive Intelligence
Chapter 4
Horacio Saggion
Free text is a main repository of human knowledge, therefore methods and techniques to access this unstructured source of knowledge are of paramount... Sample PDF
Mining Profiles and Definitions with Natural Language Processing
Chapter 5
Ying Liu, Han Tong Loh, Wen Feng Lu
This chapter introduces an approach of deriving taxonomy from documents using a novel document profile model that enables document representations... Sample PDF
Deriving Taxonomy from Documents at Sentence Level
Chapter 6
Shigeaki Sakurai
This chapter introduces knowledge discovery methods based on a fuzzy decision tree from textual data. It argues that the methods extract features of... Sample PDF
Rule Discovery from Textual Data
Chapter 7
Edson Takashi Matsubara, Maria Carolina Monard, Ronaldo Cristiano Prati
This chapter presents semi-supervised multi-view learning in the context of text mining. Semi-supervised learning uses both labelled and unlabelled... Sample PDF
Exploring Unclassified Texts Using Multiview Semisupervised Learning
Chapter 8
Lean Yu, Shouyang Wang, Kin Keung Lai
With the rapid increase of the huge amount of online information, there is a strong demand for Web text mining which helps people discover some... Sample PDF
A Multi-Agent Neural Network System for Web Text Mining
Chapter 9
Jon Atle Gulla, Hans Olaf Borch, Jon Espen Ingvaldsen
Due to the large amount of information on the web and the difficulties of relating user’s expressed information needs to document content... Sample PDF
Contextualized Clustering in Exploratory Web Search
Chapter 10
Li Weigang, Wu Man Qi
This chapter presents a study of Ant Colony Optimization (ACO) to Interlegis Web portal, Brazilian legislation Website. The approach of AntWeb is... Sample PDF
AntWeb—Web Search Based on Ant Behavior: Approach and Implementation in Case of Interlegis
Chapter 11
Leandro Krug Wives, José Palazzo Moreira de Oliveira, Stanley Loh
This chapter introduces a technique to cluster textual documents using concepts. Document clustering is a technique capable of organizing large... Sample PDF
Conceptual Clustering of Textual Documents and Some Insights for Knowledge Discovery
Chapter 12
Domonkos Tikk, György Biro, Attila Törcsvári
Abstract: Patent categorization (PC) is a typical application area of text categorization (TC). TC can be applied in different scenarios at the work... Sample PDF
A Hierarchical Online Classifier for Patent Categorization
Chapter 13
Patricia Bintzler Cerrito
The purpose of this chapter is to demonstrate how text mining can be used to reduce the number of levels in a categorical variable to then use the... Sample PDF
Text Mining to Define a Validated Model of Hospital Rankings
Chapter 14
Wagner Francisco Castilho, Gentil José de Lucena Filho, Hércules Antonio do Prado, Edilson Ferneda
Clustering analysis (CA) techniques consist in, given a set of objects, estimating dense regions of points separated by sparse regions, according to... Sample PDF
An Interpretation Process for Clustering Analysis Based on the Ontology of Language
About the Contributors