Data Clustering Using Sine Cosine Algorithm: Data Clustering Using SCA

Data Clustering Using Sine Cosine Algorithm: Data Clustering Using SCA

Vijay Kumar (Thapar University, India) and Dinesh Kumar (GJUS&T, India)
Copyright: © 2017 |Pages: 12
DOI: 10.4018/978-1-5225-2229-4.ch031

Abstract

The clustering techniques suffer from cluster centers initialization and local optima problems. In this chapter, the new metaheuristic algorithm, Sine Cosine Algorithm (SCA), is used as a search method to solve these problems. The SCA explores the search space of given dataset to find out the near-optimal cluster centers. The center based encoding scheme is used to evolve the cluster centers. The proposed SCA-based clustering technique is evaluated on four real-life datasets. The performance of SCA-based clustering is compared with recently developed clustering techniques. The experimental results reveal that SCA-based clustering gives better values in terms of cluster quality measures.
Chapter Preview
Top

Background

This section describes the related concepts of cluster analysis and related works on metaheuristics-based data clustering techniques.

Cluster Analysis

The partitional clustering technique is defined as follows. Let a dataset 978-1-5225-2229-4.ch031.m01 consists of 978-1-5225-2229-4.ch031.m02 data points, 978-1-5225-2229-4.ch031.m03. Each data point is described by 978-1-5225-2229-4.ch031.m04 features, where978-1-5225-2229-4.ch031.m05 is a vector represent the 978-1-5225-2229-4.ch031.m06 data point and 978-1-5225-2229-4.ch031.m07 represent the 978-1-5225-2229-4.ch031.m08 feature of 978-1-5225-2229-4.ch031.m09. The main aim of clustering technique is to partition the dataset into a number of clusters (say 978-1-5225-2229-4.ch031.m10) 978-1-5225-2229-4.ch031.m11 based on some similarity/dissimilarity measure. The value of 978-1-5225-2229-4.ch031.m12 may or may not be known a prior. The partition matrix is represented as 978-1-5225-2229-4.ch031.m13, 978-1-5225-2229-4.ch031.m14 and 978-1-5225-2229-4.ch031.m15, where 978-1-5225-2229-4.ch031.m16 is the membership of data point 978-1-5225-2229-4.ch031.m17 to cluster 978-1-5225-2229-4.ch031.m18 (Abraham et al., 2008). For the hard partitioning of the dataset, the following condition must be satisfied (Xu and Wunsch, 2009).

978-1-5225-2229-4.ch031.m19
(1)

For the fuzzy partitioning of the dataset, the following condition must be satisfied.

978-1-5225-2229-4.ch031.m20
(2)

Key Terms in this Chapter

Clustering: An unsupervised technique for grouping the dataset into classes of similar data.

Exploitation: The ability of finding the optimal solution around a good solution.

Validity Index: Used to measure the goodness of a clustering results comparing to other ones which are created by other clustering algorithms.

Optimization: An act or process of finding an alternative with the most cost effective or highest performance under the given constraints.

Exploration: An act of searching for the purpose of discover unknown information.

Cluster: A collection of data points that are similar to one another within the same cluster and are dissimilar to data points in other clusters.

Metaheuristic: A general algorithmic framework which can be applied to different optimization problems with relatively few modifications to make them adapted to a specific problem.

Complete Chapter List

Search this Book:
Reset