Special clustering algorithms are attractive for the task of grouping an arbitrary shaped database into several proper classes. Until now, a wide variety of clustering algorithms for this task have been proposed, although the majority of these algorithms are density-based. In this chapter, the authors extend the dissimilarity measure to compatible measure and propose a new algorithm (ASCCN) based on the results. ASCCN is an unambiguous partition method that groups objects to compatible nucleoids, and merges these nucleoids into different clusters. The application of cluster grids significantly reduces the computational cost of ASCCN, and experiments show that ASCCN can efficiently and effectively group arbitrary shaped data points into meaningful clusters.
Top1. Introduction
In data mining field, clustering pays a very important role. Clustering is the task of categorizing a set of objects into different clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some defined criteria (Huang, 1998). It is useful in a number of tasks, for example, by partitioning objects into clusters, interesting object groups may be discovered, such as the groups of clients in a banking database having a heavy investment in real estate, and clustering of data streams also can find some important applications in tracking evolution of various phenomena in medical, meteorological, astrophysical, seismic studies (Bhatnagar et al., 2009).
Cluster analysis has become the subject of active research in several fields such as statistics, pattern recognition, machine learning and data mining. Up to now, a wide variety of clustering algorithms has been proposed, and also received a lot of attention in the last few years (c.f., section 2). In these algorithms, discovery of arbitrary shaped clusters is often to be a real obstacle for their applications. Ester (1996) and Halkidi (2001) imply that some typical clustering algorithms such as k-means, CURE, ClARANS and so on will get some poor results if there are some nonconvex shape data sets or some ball-shaped data sets of significantly differing sizes in the database. To get the arbitrary shaped clusters, algorithms based on density are designed (DBSCAN is a typical one), but these algorithms also face challenges from the efficiency and the affectivity such as the computation time may be intolerable or parameters input is not “user-friendly”.
In this chapter, we present the new clustering algorithm ASCCN. It is a crisp partition method, and clusters objects with compatible nucleoids. The new algorithm requires only one input parameter, can discover arbitrary size and shaped clusters, is efficient even for large data sets especially data with high dimension. The rest of this chapter is organized as follows: In section 2, we survey related work. In section 3, we define the compatible relation. The new algorithm ASCCN is presented in section 4. The experimental results are reported to illustrate the new algorithm in section 5. Finally, we draw our conclusions in section 6.