Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Document Clustering

Handbook of Research on Innovations in Database Technologies and Applications: Current and Future Trends
The task of organizing a collection of documents, whose classification is unknown, into meaningful groups (clusters) that are homogeneous according to some notion of proximity (distance or similarity) among documents.
Published in Chapter:
XML Document Clustering
Andrea Tagarelli (University of Calabria, Italy)
DOI: 10.4018/978-1-60566-242-8.ch071
The ability of providing a “standardized, extensible means of coupling semantic information within documents describing semistructured data” (Chaudhri, Rashid, & Zicari, 2003) has led to a steady growth of XML (extensible markup language) data sources, so that XML is touted as the driving force for representing and exchanging data on the Web. The motivation behind any clustering problem is to find an inherent structure of relationships in the data and expose this structure as a set of clusters where the objects within the same cluster are each to other highly similar but very dissimilar from objects in different clusters. The clustering problem finds in text databases a fruitful research area. Since today semistructured text data has become more prevalent on the Web, and XML is the de facto standard for such data, clustering XML documents has increasingly attracted great attention. Any application domain that needs organization of complex document structures (e.g., hierarchical structures with unbounded nesting, object-oriented hierarchies) as well as data containing a few structured fields together with some largely unstructured text components can be profitably assisted by an XML document clustering task.
Full Text Chapter Download: US $37.50 Add to Cart
More Results
Exploring the Unknown Nature of Data: Cluster Analysis and Applications
Document clustering is the organization of a large amount of text documents into a small number of meaningful clusters, where each cluster represents a specific topic.
Full Text Chapter Download: US $37.50 Add to Cart
Swarm Intelligence in Text Document Clustering
Document Clustering is the process dividing a set of document collections into different number of groups based on Document contents-similarity.
Full Text Chapter Download: US $37.50 Add to Cart
A Primer on Text-Data Analysis
The process of grouping similar documents into partitions where documents within the same partition exhibit higher degree of similarity among each other than to any other document in any other partition.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR