Article Preview
TopIntroduction
Clustering is an indispensable and very important method for mining complex real-world data. It uses an unsupervised way to reveal the hidden rules and patterns of human society. In the past 20 years, a large number of excellent clustering algorithms have been proposed, applied, improved and further optimized. Overall, these algorithms can be simply divided into the following categories: partitioned clustering, hierarchical clustering, density clustering, and dynamic clustering (Saxena A et al., 2017). (1) The partitioned clustering and hierarchical clustering is the most commonly and most widely used algorithms, K-Means and BIRCH are the typical cases. This algorithm can obtain excellent clustering accuracy on regular, noise-free dataset like a circle or ellipse, and the time efficiency is very high(Stevan N et al., 2014). But, on irregular and non-uniform datasets like non-circular, the clustering accuracy of this algorithm is not satisfactory. (2) The density clustering and graph clustering are also commonly used excellent clustering algorithm, DbScan and SpectralClustering are typical cases respectively. These two kinds of algorithms are suitable for various datasets, and can achieve excellent clustering performance on irregular and uneven datasets. However, these algorithms require longer clustering time and cannot accurately identify the noise in the dataset. (3) Dynamic clustering algorithm is a novel and outstanding clustering algorithm (Bae J et al., 2020), Gravc is a typical case. The basic idea of this algorithm is to extract a dynamic process from natural phenomena such as gravitation, synchronization, and evolution, and use it to cluster complex dataset. This algorithm can achieve good clustering accuracy on irregular and uneven dataset, and can accurately identify noise and abnormal data in the dataset (Chen L et al., 2017). But, due to the dynamic clustering process, the time complexity of this algorithm is very high.
With the coming of the era of big data, more and more complex data have emerged, such as the wireless and mobile multimedia network(Ajay K et al., 2019), transportation, weather, wireless sensor network(Surender S et al., 2011), etc. These complex new data bring some new challenges to the traditional clustering algorithm from the clustering accuracy and time efficiency (Fahad A et al., 2014). For the clustering accuracy aspect, complex data in the era of big data present some new features such as irregularity, unevenness and high noise. These features will cause the clustering accuracy of traditional clustering algorithms to seriously deteriorate (Shirkhorshidi A S et al., 2014). For the time efficiency, the increasing scale of complex data puts forward higher demand on the time efficiency of traditional clustering algorithm (Mohebi A et al., 2016).