Hybrid Clustering using Elitist Teaching Learning-Based Optimization: An Improved Hybrid Approach of TLBO

Hybrid Clustering using Elitist Teaching Learning-Based Optimization: An Improved Hybrid Approach of TLBO

D.P. Kanungo (Department of Computer Science Engineering and Information Technology, Veer Surendra Sai University of Technology (VSSUT), Odisha, India), Janmenjoy Nayak (Department of Computer Science Engineering and Information Technology, Veer Surendra Sai University of Technology (VSSUT), Odisha, India), Bighnaraj Naik (Department of Computer Science Engineering and Information Technology, Veer Surendra Sai University of Technology (VSSUT), Odisha, India) and H.S. Behera (Department of Computer Science Engineering and Information Technology, Veer Surendra Sai University of Technology (VSSUT), Odisha, India)
Copyright: © 2016 |Pages: 19
DOI: 10.4018/IJRSDA.2016010101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Data clustering is a key field of research in the pattern recognition arena. Although clustering is an unsupervised learning technique, numerous efforts have been made in both hard and soft clustering. In hard clustering, K-means is the most popular method and is being used in diversified application areas. In this paper, an effort has been made with a recently developed population based metaheuristic called Elitist based teaching learning based optimization (ETLBO) for data clustering. The ETLBO has been hybridized with K-means algorithm (ETLBO-K-means) to get the optimal cluster centers and effective fitness values. The performance of the proposed method has been compared with other techniques by considering standard benchmark real life datasets as well as some synthetic datasets. Simulation and comparison results demonstrate the effectiveness and efficiency of the proposed method.
Article Preview

Introduction

The main aim of any clustering technique is to divide n no. of patterns into some of the clusters in a search space. Literature survey (Jain et. al, 1999; Hruschka et. al., 2009; Hasan and Ramakrishnan, 2011) reveals that different clustering techniques have been developed in recent years. Some descent types of clustering methods those have been frequently used by the researchers are: a) Partition based, b) Hierarchy based, c) Density based, d) Grid based. Basically, the clustering algorithms are based on some criterion to assess the quality of cluster partition. In particular, depending on the parameters, the algorithms receive the inputs as the no. of clusters, cluster density, etc. and find the good partition. K-means (MacQueen, 1967) survey is a partition based hard clustering method. It is a center based algorithm where the Euclidian distance from each point to the center is computed and the aim is to calculate the nearest cluster center. In recent days, several attempts (Arbelaitz et. al., 2013; Roy et. al., 2012; Vanfleteren et. al, 2013; Michie et. al, 2013; Arumugam et. al., 2011; Zhang et. al., 2013; Parmar and Pandi, 2015; Ward et. al., 2015; Wang et. al., 2015; Nguyen et. al., 2015; Cosby et. al., 2015; Boersma et. al., 2014; Jose et. al., 2014; Chen et. al., 2014; Bose et. al.,2013) have been made by using this method and hybridizing it with various optimization techniques to solve diversified problems. Basically, K-means algorithm is more famous than some other clustering algorithms due its simple execution steps with a minimal parameter settings and ablility to run quickly for large datasets. After consideration of a no. of successful applications of K-means, some researchers found some limitations (Arthur and Vassilvitskii, 2007; Singh et. al., 2011; Komali et. al., 2015; Acharya et. al., 2015; Liang et. al., 2015) about the algorithm which need to be addressed. As the algorithm is based on some initial selection of random cluster centers, so after the completion of each iteration, it produces different results. Although, the algorithm is able to minimize the intra cluster variance, but it is ineffective in producing global minimum variance. In a dense clustering environment, the data set is divided into k no. of clusters. For each data, the closest center point is computed based on the distance calculation procedure. But for a cluster having none of the assigning data points, it will result in an empty cluster. Also, for the data set having some outliers, the algorithm is unable to calculate an accurate cluster centroid as it originally would have been. Apart from these problems, the biggest deal is to find the global solution as K-means is highly sensitive to local optima due to the fixed unchangeable cluster centers after some point of calculations of the algorithm. However, a no. of improved versions (Cheung, 2003; Charalampidis, 2005; Şerban and Moldovan, 2006; Zhang et. al., 2008; Celebi, 2011; Tarpey, 2012; Ma et. al., 2015; Shao et. al., 2015; Duwairi and Abu-Rahmeh, 2015) and hybridization algorithm of K-means (Ahmadyfard and Modares, 2008; Wu, 2008; Kao and Lee, 2009; Schreiber, 1991; Gibou amd Fedkiw, 2005; Li et. al., 2015; Aparna and Nair, 2014; Pan et. al., 2014) have been developed. Somehow, these improved algorithms performed well for some particular applications, still they failed to obtain a global solution which is our area of focus.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 5: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 2 Issues (2015)
Volume 1: 2 Issues (2014)
View Complete Journal Contents Listing