Reference Hub1
Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters

Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters

Wan Maseri Binti Wan Mohd, A.H. Beg, Tutut Herawan, A. Noraziah, K. F. Rabbi
Copyright: © 2013 |Pages: 13
ISBN13: 9781466638983|ISBN10: 1466638982|EISBN13: 9781466638990
DOI: 10.4018/978-1-4666-3898-3.ch010
Cite Chapter Cite Chapter

MLA

Mohd, Wan Maseri Binti Wan, et al. "Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters." Information Retrieval Methods for Multidisciplinary Applications, edited by Zhongyu Lu, IGI Global, 2013, pp. 156-168. https://doi.org/10.4018/978-1-4666-3898-3.ch010

APA

Mohd, W. M., Beg, A., Herawan, T., Noraziah, A., & Rabbi, K. F. (2013). Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters. In Z. Lu (Ed.), Information Retrieval Methods for Multidisciplinary Applications (pp. 156-168). IGI Global. https://doi.org/10.4018/978-1-4666-3898-3.ch010

Chicago

Mohd, Wan Maseri Binti Wan, et al. "Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters." In Information Retrieval Methods for Multidisciplinary Applications, edited by Zhongyu Lu, 156-168. Hershey, PA: IGI Global, 2013. https://doi.org/10.4018/978-1-4666-3898-3.ch010

Export Reference

Mendeley
Favorite

Abstract

K-means is an unsupervised learning and partitioning clustering algorithm. It is popular and widely used for its simplicity and fastness. K-means clustering produce a number of separate flat (non-hierarchical) clusters and suitable for generating globular clusters. The main drawback of the k-means algorithm is that the user must specify the number of clusters in advance. This paper presents an improved version of K-means algorithm with auto-generate an initial number of clusters (k) and a new approach of defining initial Centroid for effective and efficient clustering process. The underlined mechanism has been analyzed and experimented. The experimental results show that the number of iteration is reduced to 50% and the run time is lower and constantly based on maximum distance of data points, regardless of how many data points.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.