Robust Clustering with Distance and Density

Robust Clustering with Distance and Density

Hanning Yuan, Shuliang Wang, Jing Geng, Yang Yu, Ming Zhong
Copyright: © 2017 |Pages: 12
DOI: 10.4018/IJDWM.2017040104
(Individual Articles)
No Current Special Offers


Clustering is fundamental for using big data. However, AP (affinity propagation) is not good at non-convex datasets, and the input parameter has a marked impact on DBSCAN (density-based spatial clustering of applications with noise). Moreover, new characteristics such as volume, variety, velocity, veracity make it difficult to group big data. To address the issues, a parameter free AP (PFAP) is proposed to group big data on the basis of both distance and density. Firstly, it obtains a group of normalized density from the AP clustering. The estimated parameters are monotonically. Then, the density is used for density clustering for multiple times. Finally, the multiple-density clustering results undergo a two-stage amalgamation to achieve the final clustering result. Experimental results on several benchmark datasets show that PFAP has been achieved better clustering quality than DBSCAN, AP, and APSCAN. And it also has better performance than APSCAN and FSDP.
Article Preview

AP is a distance-based algorithm for identifying exemplars in a dataset by imitating the message passing and feedback routine between the data objects (Dueck & Frey, 2007; Frey& Dueck, 2007). It enjoys lower error than traditional methods, which is computationally efficient in many applications (Dueck & Frey, 2007; Dueck et al., 2008). The measurements of the mutual similarity among IJDWM.2017040104.m01 objects are recorded in an input matrix of IJDWM.2017040104.m02. The diagonal of the matrix, IJDWM.2017040104.m03, is treated as the reference for the data object IJDWM.2017040104.m04 to become the cluster center. The responsibility IJDWM.2017040104.m05 that is sent from data object IJDWM.2017040104.m06 to the candidate clustering center IJDWM.2017040104.m07, indicates how suitable an object IJDWM.2017040104.m08 can be used as a cluster center for the object IJDWM.2017040104.m09. The availability IJDWM.2017040104.m10 that is sent from the candidate cluster centerIJDWM.2017040104.m11 to the data object IJDWM.2017040104.m12, reflects how likely the object IJDWM.2017040104.m13 chooses IJDWM.2017040104.m14 as its cluster center. The larger the value of IJDWM.2017040104.m15 and IJDWM.2017040104.m16, the higher the probability that object IJDWM.2017040104.m17 is to become the cluster center. Consequently, increase the chance that an object IJDWM.2017040104.m18 belongs to a cluster with its center at object IJDWM.2017040104.m19. During this iterative process, AP keeps updating IJDWM.2017040104.m20 and IJDWM.2017040104.m21 between the data objects until the predefined convergence criteria is met. The AP parameters adhere to what are used in its original settings, for example, a maximum of iterations is 1000, the upper limit of steady times is 100, and the damping coefficient is 0.9. The reference of clustering center is chosen to be the median value of similarity matrix.

Complete Article List

Search this Journal:
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing