Article Preview
Top1. Introduction
Data clustering is the procedure of clustering together similar multi-dimensional data vectors. A comprehensive study and analysis of the different partitional clustering algorithms is given in (Aparna & Nair, 2015a). Clustering algorithms have been employed to a broad range of problems, together with exploratory data analysis, data mining (Evangelou et al, 2001), image segmentation (Lillesand et al, 1994) and mathematical programming (Andrews H.C, 1972), (Rao, 1971). Clustering techniques have been employed effectively to address the scalability problem of machine learning and data mining algorithms and also for developing optimized performance (Jain et al, 1999), (Quinlan, 1993), (Potgieter, 2002). Clustering replicates the statistical structure of the general collection of input patterns in the data and hence the subset of patterns has definite meanings (Roy & Sharma, 2010). The pattern can be symbolized mathematically by a vector in the multi-dimensional space.
Clustering algorithms can be clustered into two main classes of algorithms, namely supervised and unsupervised. The shortage of category information differentiates data clustering (unsupervised learning) from categorization or discriminant analysis (supervised learning). Clustering is the process of finding out different structures in data that are analytical in nature (Yip et al, 2004). No labelled data are accessible (Everitt et al, 2001), (Jain & Dubes, 1988) in unsupervised classification which is also called clustering. The objective of clustering is to divide a fixed unlabeled data set into a fixed and separate set of “natural”, hidden data structures (Baraldi & Alpaydin, 2002), (Cherkassky & Mulier, 1988). For several learning domains, the characteristics that are potentially constructive are described manually. On the other hand, not all of these characteristics may be related. Selecting a subset of the original characteristics will frequently lead to improved presentation in such a case. Feature selection algorithms exploit some functionalities of predictive precision (Dy & Brodley, 2004) for supervised learning.
A lot of clustering algorithms have been proposed. One of the most famous hard clustering algorithms is K-Means which divides data objects into k clusters (Kanungo et al, 2002) Fuzzy algorithms can allocate data objects into multiple clusters. Fuzzy C-Means clustering is an efficient algorithm; moreover the arbitrary choice in initializing the centre points makes the iterative process in achieving local optimal solution without difficulty. In order to enhance the solution, many evolutionary algorithms such as Genetic Algorithm (GA) (Maulik & Bandyopadhyay, 2000), Simulated Annealing (SA) (Bandyopadhyay et al, 2001), Ant Colony Optimization (ACO) (Dai et al, 2009), and Particle Swarm Optimization (PSO) (Ghorpade & Metre, 2014) have been effectively used for the clustering. In addition, Multi-objective clustering is used to decompose a dataset into related groups, thereby maximizing multiple objectives. Multi-objective clustering can be looked out as a unique case of multi-objective optimization which plans to concurrently optimize multiple objectives under definite constraints.