Development of Fractional Genetic PSO Algorithm for Multi Objective Data Clustering

Development of Fractional Genetic PSO Algorithm for Multi Objective Data Clustering

Aparna K. (B. M. S. Institute of Technology, Bangalore, India) and Mydhili K. Nair (Department of Information Science and Engineering, M. S. Ramaiah Institute of Technology, Bangalore, India)
Copyright: © 2016 |Pages: 16
DOI: 10.4018/IJAEC.2016070101


Clustering is the task of finding natural partitioning within a data set such that data items within the same group are more similar than those within different groups. The performance of the traditional K-Means and Bisecting K-Means algorithm degrades as the dimensionality of the data increases. In order to find better clustering results, it is important to enhance the traditional algorithms by incorporating various constraints. Hence it is planned to develop a Multi-Objective Optimization (MOO) technique by including different objectives, like MSE, Stability measure, DB index, XB-index and sym-index. These five objectives will be used as fitness function for the proposed Fractional Genetic PSO algorithm (FGPSO) which is the hybrid optimization algorithm to do the clustering process. The performance of the proposed multi objective FGPSO algorithm will be evaluated based on clustering accuracy. Finally, the applicability of the proposed algorithm will be checked for some benchmark data sets available in the UCI machine learning repository.
Article Preview

1. Introduction

Data clustering is the procedure of clustering together similar multi-dimensional data vectors. A comprehensive study and analysis of the different partitional clustering algorithms is given in (Aparna & Nair, 2015a). Clustering algorithms have been employed to a broad range of problems, together with exploratory data analysis, data mining (Evangelou et al, 2001), image segmentation (Lillesand et al, 1994) and mathematical programming (Andrews H.C, 1972), (Rao, 1971). Clustering techniques have been employed effectively to address the scalability problem of machine learning and data mining algorithms and also for developing optimized performance (Jain et al, 1999), (Quinlan, 1993), (Potgieter, 2002). Clustering replicates the statistical structure of the general collection of input patterns in the data and hence the subset of patterns has definite meanings (Roy & Sharma, 2010). The pattern can be symbolized mathematically by a vector in the multi-dimensional space.

Clustering algorithms can be clustered into two main classes of algorithms, namely supervised and unsupervised. The shortage of category information differentiates data clustering (unsupervised learning) from categorization or discriminant analysis (supervised learning). Clustering is the process of finding out different structures in data that are analytical in nature (Yip et al, 2004). No labelled data are accessible (Everitt et al, 2001), (Jain & Dubes, 1988) in unsupervised classification which is also called clustering. The objective of clustering is to divide a fixed unlabeled data set into a fixed and separate set of “natural”, hidden data structures (Baraldi & Alpaydin, 2002), (Cherkassky & Mulier, 1988). For several learning domains, the characteristics that are potentially constructive are described manually. On the other hand, not all of these characteristics may be related. Selecting a subset of the original characteristics will frequently lead to improved presentation in such a case. Feature selection algorithms exploit some functionalities of predictive precision (Dy & Brodley, 2004) for supervised learning.

A lot of clustering algorithms have been proposed. One of the most famous hard clustering algorithms is K-Means which divides data objects into k clusters (Kanungo et al, 2002) Fuzzy algorithms can allocate data objects into multiple clusters. Fuzzy C-Means clustering is an efficient algorithm; moreover the arbitrary choice in initializing the centre points makes the iterative process in achieving local optimal solution without difficulty. In order to enhance the solution, many evolutionary algorithms such as Genetic Algorithm (GA) (Maulik & Bandyopadhyay, 2000), Simulated Annealing (SA) (Bandyopadhyay et al, 2001), Ant Colony Optimization (ACO) (Dai et al, 2009), and Particle Swarm Optimization (PSO) (Ghorpade & Metre, 2014) have been effectively used for the clustering. In addition, Multi-objective clustering is used to decompose a dataset into related groups, thereby maximizing multiple objectives. Multi-objective clustering can be looked out as a unique case of multi-objective optimization which plans to concurrently optimize multiple objectives under definite constraints.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing