Data Clustering Using Sine Cosine Algorithm: Data Clustering Using SCA

Data Clustering Using Sine Cosine Algorithm: Data Clustering Using SCA

Vijay Kumar (Thapar University, India) and Dinesh Kumar (GJUS&T, India)
Copyright: © 2017 |Pages: 12
DOI: 10.4018/978-1-5225-2229-4.ch031


The clustering techniques suffer from cluster centers initialization and local optima problems. In this chapter, the new metaheuristic algorithm, Sine Cosine Algorithm (SCA), is used as a search method to solve these problems. The SCA explores the search space of given dataset to find out the near-optimal cluster centers. The center based encoding scheme is used to evolve the cluster centers. The proposed SCA-based clustering technique is evaluated on four real-life datasets. The performance of SCA-based clustering is compared with recently developed clustering techniques. The experimental results reveal that SCA-based clustering gives better values in terms of cluster quality measures.
Chapter Preview


This section describes the related concepts of cluster analysis and related works on metaheuristics-based data clustering techniques.

Cluster Analysis

The partitional clustering technique is defined as follows. Let a dataset consists of data points, . Each data point is described by features, where is a vector represent the data point and represent the feature of . The main aim of clustering technique is to partition the dataset into a number of clusters (say ) based on some similarity/dissimilarity measure. The value of may or may not be known a prior. The partition matrix is represented as , and , where is the membership of data point to cluster (Abraham et al., 2008). For the hard partitioning of the dataset, the following condition must be satisfied (Xu and Wunsch, 2009).


For the fuzzy partitioning of the dataset, the following condition must be satisfied.


Key Terms in this Chapter

Clustering: An unsupervised technique for grouping the dataset into classes of similar data.

Exploitation: The ability of finding the optimal solution around a good solution.

Validity Index: Used to measure the goodness of a clustering results comparing to other ones which are created by other clustering algorithms.

Optimization: An act or process of finding an alternative with the most cost effective or highest performance under the given constraints.

Exploration: An act of searching for the purpose of discover unknown information.

Cluster: A collection of data points that are similar to one another within the same cluster and are dissimilar to data points in other clusters.

Metaheuristic: A general algorithmic framework which can be applied to different optimization problems with relatively few modifications to make them adapted to a specific problem.

Complete Chapter List

Search this Book: