Data Clustering Using Sine Cosine Algorithm: Data Clustering Using SCA

Data Clustering Using Sine Cosine Algorithm: Data Clustering Using SCA

Vijay Kumar (Thapar University, India) and Dinesh Kumar (GJUS&T, India)
Copyright: © 2017 |Pages: 12
DOI: 10.4018/978-1-5225-2229-4.ch031
OnDemand PDF Download:
No Current Special Offers


The clustering techniques suffer from cluster centers initialization and local optima problems. In this chapter, the new metaheuristic algorithm, Sine Cosine Algorithm (SCA), is used as a search method to solve these problems. The SCA explores the search space of given dataset to find out the near-optimal cluster centers. The center based encoding scheme is used to evolve the cluster centers. The proposed SCA-based clustering technique is evaluated on four real-life datasets. The performance of SCA-based clustering is compared with recently developed clustering techniques. The experimental results reveal that SCA-based clustering gives better values in terms of cluster quality measures.
Chapter Preview


This section describes the related concepts of cluster analysis and related works on metaheuristics-based data clustering techniques.

Cluster Analysis

The partitional clustering technique is defined as follows. Let a dataset 978-1-5225-2229-4.ch031.m01 consists of 978-1-5225-2229-4.ch031.m02 data points, 978-1-5225-2229-4.ch031.m03. Each data point is described by 978-1-5225-2229-4.ch031.m04 features, where978-1-5225-2229-4.ch031.m05 is a vector represent the 978-1-5225-2229-4.ch031.m06 data point and 978-1-5225-2229-4.ch031.m07 represent the 978-1-5225-2229-4.ch031.m08 feature of 978-1-5225-2229-4.ch031.m09. The main aim of clustering technique is to partition the dataset into a number of clusters (say 978-1-5225-2229-4.ch031.m10) 978-1-5225-2229-4.ch031.m11 based on some similarity/dissimilarity measure. The value of 978-1-5225-2229-4.ch031.m12 may or may not be known a prior. The partition matrix is represented as 978-1-5225-2229-4.ch031.m13, 978-1-5225-2229-4.ch031.m14 and 978-1-5225-2229-4.ch031.m15, where 978-1-5225-2229-4.ch031.m16 is the membership of data point 978-1-5225-2229-4.ch031.m17 to cluster 978-1-5225-2229-4.ch031.m18 (Abraham et al., 2008). For the hard partitioning of the dataset, the following condition must be satisfied (Xu and Wunsch, 2009).


For the fuzzy partitioning of the dataset, the following condition must be satisfied.


Key Terms in this Chapter

Clustering: An unsupervised technique for grouping the dataset into classes of similar data.

Exploitation: The ability of finding the optimal solution around a good solution.

Validity Index: Used to measure the goodness of a clustering results comparing to other ones which are created by other clustering algorithms.

Optimization: An act or process of finding an alternative with the most cost effective or highest performance under the given constraints.

Exploration: An act of searching for the purpose of discover unknown information.

Cluster: A collection of data points that are similar to one another within the same cluster and are dissimilar to data points in other clusters.

Metaheuristic: A general algorithmic framework which can be applied to different optimization problems with relatively few modifications to make them adapted to a specific problem.

Complete Chapter List

Search this Book: