Global Artificial Bee Colony Search Algorithm for Data Clustering

Global Artificial Bee Colony Search Algorithm for Data Clustering

Zeeshan Danish (University of Malakand, Charsadda, Pakistan), Habib Shah (King Khalid University, Abha, Saudi Arabia), Nasser Tairan (King Khalid University, Abha, Saudi Arabia), Rozaida Gazali (Universiti Tun Hussein Onn Malaysia, Malaysia) and Akhtar Badshah (Department of Software Engineering, University of Malakand, Pakistan)
Copyright: © 2019 |Pages: 12
DOI: 10.4018/IJSIR.2019040104

Abstract

Data clustering is a widespread data compression, vector quantization, data analysis, and data mining technique. In this work, a modified form of ABC, i.e. global artificial bee colony search algorithm (GABCS) is applied to data clustering. In GABCS the modification is due to the fact that experienced bees can use past information of quantity of food and position to adjust their movements in a search space. Due to this fact, solution search equations of the canonical ABC are modified in GABCS and applied to three famous real datasets in this work i.e. iris, thyroid, wine, accessed from the UCI database for the purpose of data clustering and results were compared with few other stated algorithms such as K-NM-PSO, TS, ACO, GA, SA and ABC. The results show that while calculating intra-clustering distances and computation time on all three real datasets, the proposed GABCS algorithm gives far better performance than other algorithms whereas calculating computation numbers it performs adequately as compared to typical ABC.
Article Preview

Introduction

Clustering is one of the most tough tasks especially in pattern Recognition (Kamel & Selim, 1994), Image analysis (Omran, Salman, & Engelbrecht, 2002) and other complex applications (Bouveyron & Brunet-Saumard, 2014; Fu, Niu, Zhu, Wu, & Li, 2012; Lv et al., 2016). It utilized as a part of many fields including image analysis, data mining, machine learning, bioinformatics and pattern recognition in which the dispersion of data is of any shape and size and is a well-known method for statistical data analysis. In pattern recognition, Data analysis can be done by two different learning methods: one is unsupervised and other one is supervised, the first concerning only unlabeled data (training patterns with recognized category labels) whereas the second one concerning labeled data (Hart & Stork, 2001; Peters & Weber, 2012). The third type of method which is the hybrid of the unsupervised and supervised learning is Semi supervised (Chapelle, Olivier and Scholkopf, Bernhard & Zien, 2009), in which some of the available data is labeled (supervised) while other is given unlabeled (unsupervised). Several approaches have been applied on unsupervised learning for instance Ant colony clustering algorithm, K-Means algorithm, genetic algorithm, Tabu Search, Simulated Annealing approach, Particle swarm optimization, ABC algorithm, HABC, FABC, and Cuckoo Search Algorithm CSA (Dervis, Karaboga & Ozturk, 2011; Shah, Herawan, Naseem, & Ghazali, 2014; Zhang, Liu, Yang, & Dai, 2016). They are discussed in the next section in detail.

K-means algorithm is the standout amongst the maximum widely recognized class of clustering algorithms (Selim & Alsultan, 1991) which is a fast, simple and center based algorithm. The key working of K-means algorithm is that it finds out the partitions so that the squared error between the points in the cluster and the empirical mean of a cluster is reduced. This algorithm has the insufficiencies that it extremely relays on the starting conditions and from the very initial position of search, converges to local minima and with reasonable quantity of computation effort it cannot find global solutions of large problems (Fathian, Amiri, & Maroosi, 2007). So as to overwhelmed local optima problem, the researchers having various backgrounds of research are applying i.e. density-based clustering, artificial intelligence based clustering methods, partition-based clustering and hierarchical clustering, for instance: graph theory (Zahn, 1971), statistics (Forgy, 1965),expectation, evolutionary algorithms, artificial neural networks and swarm intelligence algorithms (Bakhta & Ghalem, 2014; Bouarara, Hamou, & Amine, 2015; Cheng, Shi, & Qin, 2011; Harish, Jagdish Chand, Arya, & Kusum, 2012; Tarun Kumar & Millie, 2011).

Simulated Annealing approach has been discussed and proved theoretically by Selim and Al-Sultan that the clustering problem of getting stuck at local minima faced by K-means can be resolved (Selim & Alsultan, 1991). The algorithm does not “stick” to a local optimal solution, somewhat it obtains the optimum solution. A disadvantage of the simulated annealing approach is that no characterization of an ending point is computationally offered. Another disadvantage is that verifying that a set of data is Standard Data is more difficult than solving the clustering problem itself. A new algorithm based on a TS technique is used for solving this problem. For many test problems the algorithm accomplished preferred outcomes than the famous k-means and the SA algorithms (Al-Sultan, 1995).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 10: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing