A Novel Hybrid Algorithm Based on K-Means and Evolutionary Computations for Real Time Clustering

A Novel Hybrid Algorithm Based on K-Means and Evolutionary Computations for Real Time Clustering

Taha Mansouri (Department of Industrial Management, Allameh Tabataba'i University, Tehran, Iran), Ahad Zare Ravasan (Department of Industrial Management, Allameh Tabataba'i University, Tehran, Iran) and Mohammad Reza Gholamian (School of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran)
Copyright: © 2014 |Pages: 14
DOI: 10.4018/ijdwm.2014070101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

One of the most widely used algorithms to solve clustering problems is the K-means. Despite of the algorithm's timely performance to find a fairly good solution, it shows some drawbacks like its dependence on initial conditions and trapping in local minima. This paper proposes a novel hybrid algorithm, comprised of K-means and a variation operator inspired by mutation in evolutionary algorithms, called Noisy K-means Algorithm (NKA). Previous research used K-means as one of the genetic operators in Genetic Algorithms. However, the proposed NKA is a kind of individual based algorithm that combines advantages of both K-means and mutation. As a result, proposed NKA algorithm has the advantage of faster convergence time, while escaping from local optima. In this algorithm, a probability function is utilized which adaptively tunes the rate of mutation. Furthermore, a special mutation operator is used to guide the search process according to the algorithm performance. Finally, the proposed algorithm is compared with the classical K-means, SOM Neural Network, Tabu Search and Genetic Algorithm in a given set of data. Simulation results statistically demonstrate that NKA out-performs all others and it is prominently prone to real time clustering.
Article Preview

Introduction

Data Mining is a process of discovering new, unexpected, valuable patterns from existing databases (Agrawal, Mannila, Srikant, Toivonen, & Verkamo, 1996; Chen, Han, & Yu, 1996). Data mining is finding increasing popularity in science and practice areas that need to analyze large amounts of data to discover patterns and trends, in which they could not otherwise find. Different applications may require different data mining techniques. The kinds of knowledge that could be discovered from a database are categorized into association rules mining, sequential patterns mining, classification and clustering (Daly & Taniar, 2004).

Clustering is the process of grouping a data set in such a way that the similarity between data within a cluster is maximized while the similarity between data of different clusters is minimized (Kwok, Smith, Lozano, & Taniar, 2002).

Clustering methods partition a set of objective into subsets such that objects in the same subsets are more similar to each other than objects in different ones according to some defined criteria (Ng & Wong, 2002). Clustering task does not try to classify, estimate, or predict the value of a target variable. Instead, they seek to segment the entire data set into comparatively homogeneous clusters. The similarity of records inside each cluster is maximized and the similarity of records between clusters is minimized (Larose, 2004).

Resulting groups or clusters lead the end user to estimate great data values (Berson & Smith, 2002). One of the main advantages of clustering technique is that there is no need of basic knowledge regarding to data and we only have to choose a good index for measuring. Another advantage of this technique is to be appropriate for any kind of data by choosing a suitable criterion (Jain, Murty, & Flynn, 1999).

Clustering has been effectively applied in a variety of anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science (Celebi, Kingravi, & Vela, 2013). It is often included as first step in data mining process and the results of clustering are used in the other models as inputs (Hand, Mannila, & Smyth, 2005). It should be considered that due to different aspects, clustering of observations is a difficult problem to solve and categorized in class of NP-complete problems (Güngör & Ünler, 2007).

Clustering algorithms can be broadly classified into two groups: hierarchical and partitional (Jain, 2010). Hierarchical algorithms recursively find nested clusters either in a top-down (divisive) or bottom-up (agglomerative) fashion. In contrast, partitional algorithms find all the clusters simultaneously as a partition of the data and do not impose a hierarchical structure (Celebi, et al., 2013). In other words, partitional clustering algorithms divide the data set into non-overlapping groups (Jain, et al., 1999). Algorithms such as k-mean, bisecting k-mean, and k-modes belong to this category. Furthermore, some other bio-inspired algorithms like Artificial Neural Networks (ANNs), heuristics and Evolutionary Algorithms (EAs) are frequently applied for clustering problems.

K-means as one of the most famous clustering algorithm is a straightforward and effective way for finding clusters in data (Jain, 2010). The popularity of K-means can be attributed to several reasons. First, it is conceptually simple and easy to implement. Almost all data mining software includes an implementation of it. Second, it is versatile; meanwhile almost every aspect of the algorithm (initialization, distance function, termination criterion, etc.) can be modified. Third, it has a time complexity that is linear in N, D, and K (in general, D_N and K_N). For this reason, it can be used to initialize more expensive clustering algorithms. Fourth, it has a storage complexity that is linear in N, D, and K. Fifth, it is guaranteed to converge at a quadratic rate. Finally, it is invariant to data ordering, i.e. random shuffling of the data points (Celebi, et al., 2013).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing