Understanding the SNN Input Parameters and How They Affect the Clustering Results

Understanding the SNN Input Parameters and How They Affect the Clustering Results

Guilherme Moreira (ALGORITMI Research Centre, University of Minho, Guimarães, Portugal), Maribel Yasmina Santos (ALGORITMI Research Centre, University of Minho, Guimarães, Portugal), João Moura Pires (NOVA LINCS, Nova University of Lisbon, Lisbon, Portugal) and João Galvão (ALGORITMI Research Centre, University of Minho, Guimarães, Portugal)
Copyright: © 2015 |Pages: 23
DOI: 10.4018/IJDWM.2015070102
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Huge amounts of data are available for analysis in nowadays organizations, which are facing several challenges when trying to analyze the generated data with the aim of extracting useful information. This analytical capability needs to be enhanced with tools capable of dealing with big data sets without making the analytical process an arduous task. Clustering is usually used in the data analysis process, as this technique does not require any prior knowledge about the data. However, clustering algorithms usually require one or more input parameters that influence the clustering process and the results that can be obtained. This work analyses the relation between the three input parameters of the SNN (Shared Nearest Neighbor) clustering algorithm, providing a comprehensive understanding of the relationships that were identified between k, Eps and MinPts, the algorithm's input parameters. Moreover, this work also proposes specific guidelines for the definition of the appropriate input parameters, optimizing the processing time, as the number of trials needed to achieve appropriate results can be substantial reduced.
Article Preview

Clustering is the task of identifying sets of segments or clusters that group similar objects. A cluster is a collection of data objects that have more similarities between them and are dissimilar to objects that belong to other clusters (Han & Kamber, 2001).

Density-based clustering approaches were developed based on the notion of density (Han & Kamber, 2001). These algorithms perceive clusters as dense regions of objects in a space separated by regions of relatively low density. This kind of algorithms is useful to filter out noise and for discovering clusters of arbitrary shapes (Ye & others, 2003). DBSCAN (Ester et al., 1996) and OPTICS (Ankerst, Breunig, Kriegel, & Sander, 1999) are major representatives of this class of clustering algorithms, being DBSCAN the most representative density-based clustering algorithm. Many of the available density-based algorithms were derived from DBSCAN, which was introduced by Ester (Ester et al., 1996) and was specially designed to treat spatial databases.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing