Clustering is a process of finding natural grouping present in a dataset. Various clustering methods are proposed to work with various types of data. The quality of the solution as well as the time taken to derive the solution is important when dealing with large datasets like that in a typical documents database. Recently hybrid and ensemble based clustering methods are shown to yield better results than conventional methods. The chapter proposes two clustering methods; one is based on a hybrid scheme and the other based on an ensemble scheme. Both of these are experimentally verified and are shown to yield better and faster results.
Clustering is a process of finding groups called clusters present in a given data set such that the data items present in a cluster are similar to each other, whereas those present in different clusters are dissimilar. There are various clustering methods applied in various fields which use various similarity measures (Jain, Murty & Flynn, 1999). Even though the problem seems simple and a relatively older one, it is still an active research area, and recently it is shown that there is no clustering method which satisfies certain simple properties (Kleinberg, 2002). A good clustering method in one field need not be a good one in some other field.
Key Terms in this Chapter
Partition of a Dataset: A collection of subsets of the dataset such that every pair of distinct subsets are disjoint and the union of the collection is equal to the dataset.
Prototype: A representative pattern from the given dataset.
Consensus Function: It is a mapping from a set of solutions (which might be intermediate solutions) to a single final solution. The solution can be a clustering of a dataset, a classification decision by a classifier, etc,.
Clustering of a Dataset: A collection of subsets of the dataset so that their union is equal to the dataset.
Document: A sequence of words. When each word is seen as a feature and frequency of the word as the feature value, a document can be represented as a vector of frequencies which can be seen as a pattern (see the definition of pattern below).
Pattern: An object either physical or abstract which can be represented using a set of feature values. Normally a pattern is seen as a point in a feature space.
Complete Chapter List
Min Song, Yi-Fang Brook Wu
Min Song, Yi-Fang Brook Wu
Yi-fang Brook Wu, Quanzhi Li
Xiaoyan Yu, Manas Tungare, Weigo Yuan, Yubo Yuan, Manuel Pérez-Quiñones, Edward A. Fox
Ricco Rakotomalala, Faouzi Mhamdi
Abdelmalek Amine, Zakaria Elberrichi, Michel Simonet, Ladjel Bellatreche, Mimoun Malki
Lean Yu, Shouyang Wang, Kin Keung Lai
Yi-fang Brook Wu, Xin Chen
Luis M. de Campos
Stanley Loh, Leandro Krug Wives, Daniel Lichtnow, José Palazzo M. de Oliveira
Quanzhi Li, Yi-fang Brook Wu
Rosa Meo, Maristella Matera
Brigitte Trousse, Marie-Aude Aufaure, Bénédicte Le Grand, Yves Lechevallier, Florent Masseglia
Stanley R.M. Oliveira, Osmar R. Zaïane
G.S. Mahalakshmi, S. Sendhilkumar
Ganesh Ramakrishnan, Pushpak Bhattacharyya
Giuseppe Manco, Riccardo Ortale, Andrea Tagarelli
Alexander Dreweke, Ingrid Fischer, Tobias Werth, Marc Wörlein
Nitin Agarwal, Huan Liu, Jianping Zhang
Pasquale De Meo
Richard S. Segall
Ah Chung Tsoi, Phuong Kim To, Markus Hagenbuchner
Miao-Ling Wang, Hsiao-Fan Wang