A Support Based Initialization Algorithm for Categorical Data Clustering

A Support Based Initialization Algorithm for Categorical Data Clustering

Ajay Kumar, Shishir Kumar
Copyright: © 2018 |Pages: 15
DOI: 10.4018/JITR.2018040104
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.
Article Preview
Top

To find the natural groups in a categorical data set, various methods are documented in the literature. Guha et al. introduced the ROCK algorithm and used the Jaccard coefficient to compute the distances between the objects (Guha, 2000). The ROCK algorithm clusters objects in an agglomerative way such that the number of links within a cluster can be maximized. The k-mode algorithm on other hands is one of the popular methods (Bai, Liang, Dang & Cao, 2012). It extends the work of k-means algorithm to cluster categorical data.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 15: 6 Issues (2022): 1 Released, 5 Forthcoming
Volume 14: 4 Issues (2021)
Volume 13: 4 Issues (2020)
Volume 12: 4 Issues (2019)
Volume 11: 4 Issues (2018)
Volume 10: 4 Issues (2017)
Volume 9: 4 Issues (2016)
Volume 8: 4 Issues (2015)
Volume 7: 4 Issues (2014)
Volume 6: 4 Issues (2013)
Volume 5: 4 Issues (2012)
Volume 4: 4 Issues (2011)
Volume 3: 4 Issues (2010)
Volume 2: 4 Issues (2009)
Volume 1: 4 Issues (2008)
View Complete Journal Contents Listing