Article Preview
Top1. Introduction
In data mining field, clustering pays a very important role. Clustering is the task of categorizing a set of objects into different clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some defined criteria (Huang, 1998). It is useful in a number of tasks, for example, by partitioning objects into clusters, interesting object groups may be discovered, such as the groups of clients in a banking database having a heavy investment in real estate, and clustering of data streams also can find some important applications in tracking evolution of various phenomena in medical, meteorological, astrophysical, seismic studies (Bhatnagar et al., 2009).
Cluster analysis has become the subject of active research in several fields such as statistics, pattern recognition, machine learning and data mining. Up to now, a wide variety of clustering algorithms has been proposed, and also received a lot of attention in the last few years (c.f., section 2). In these algorithms, discovery of arbitrary shaped clusters is often to be a real obstacle for their applications. Ester (1996) and Halkidi (2001) imply that some typical clustering algorithms such as k-means, CURE, ClARANS and so on will get some poor results if there are some nonconvex shape data sets or some ball-shaped data sets of significantly differing sizes in the database. To get the arbitrary shaped clusters, algorithms based on density are designed (DBSCAN is a typical one), but these algorithms also face challenges from the efficiency and the affectivity such as the computation time may be intolerable or parameters input is not “user-friendly”.
In this paper, we present the new clustering algorithm ASCCN. It is a crisp partition method, and clusters objects with compatible nucleoids. The new algorithm requires only one input parameter, can discover arbitrary size and shaped clusters, is efficient even for large data sets especially data with high dimension. The rest of this paper is organized as follows: In section 2, we survey related work. In section 3, we define the compatible relation. The new algorithm ASCCN is presented in section 4. The experimental results are reported to illustrate the new algorithm in section 5. Finally, we draw our conclusions in section 6.