Determination of Optimal Clusters Using a Genetic Algorithm

Determination of Optimal Clusters Using a Genetic Algorithm

Tushar (Indian Institute of Technology, Kharagpur, India), Tushar (Indian Institute of Technology, Kharagpur, India), Shibendu Shekhar Roy (Indian Institute of Technology, Kharagpur, India) and Dilip Kumar Pratihar (Indian Institute of Technology, Kharagpur, India)
Copyright: © 2008 |Pages: 20
DOI: 10.4018/978-1-59904-960-1.ch005
OnDemand PDF Download:


Clustering is a potential tool of data mining. A clustering method analyzes the pattern of a data set and groups the data into several clusters based on the similarity among themselves. Clusters may be either crisp or fuzzy in nature. The present chapter deals with clustering of some data sets using Fuzzy C-Means (FCM) algorithm and Entropy-based Fuzzy Clustering (EFC) algorithm. In FCM algorithm, the nature and quality of clusters depend on the pre-defined number of clusters, level of cluster fuzziness and a threshold value utilized for obtaining the number of outliers (if any). On the other hand, the quality of clusters obtained by the EFC algorithm is dependent on a constant used to establish the relationship between the distance and similarity of two data points, a threshold value of similarity and another threshold value used for determining the number of outliers. The clusters should ideally be distinct and at the same time compact in nature. Moreover, the number of outliers should be as minimum as possible. Thus, the above problem may be posed as an optimization problem, which will be solved using a Genetic Algorithm (GA). The best set of multi-dimensional clusters will be mapped into 2-D for visualization using a Self-Organizing Map (SOM).

Complete Chapter List

Search this Book:
Table of Contents
David Taniar
Chapter 1
Riadh Ben Messaoud, Sabine Loudcher Rabaséda, Rokia Missaoui, Omar Boussaid
Data warehouses and OLAP (online analytical processing) provide tools to explore and navigate through data cubes in order to extract interesting... Sample PDF
OLEMAR: An Online Environment for Mining Association Rules in Multidimensional Data
Chapter 2
Yun Sing Koh, Richard O’Keefe, Nathan Rountree
Association rules are patterns that offer useful information on dependencies that exist between the sets of items. Current association rule mining... Sample PDF
Interestingness Measures for Association Rules: What Do They Really Measure?
Chapter 3
Qin Ding, Gnanasekaran Sundarraj
With the growing usage of XML in the World Wide Web and elsewhere as a standard for the exchange of data and to represent semi-structured data... Sample PDF
Mining Association Rules from XML Data
Chapter 4
Yue-Shi Lee, Show-Jane Yen
Web mining is one of the mining technologies, which applies data mining techniques in large amount of web data to improve the web services. Web... Sample PDF
A Lattice-Based Framework for Interactively and Incrementally Mining Web Traversal Patterns
Chapter 5
Tushar, Tushar, Shibendu Shekhar Roy, Dilip Kumar Pratihar
Clustering is a potential tool of data mining. A clustering method analyzes the pattern of a data set and groups the data into several clusters... Sample PDF
Determination of Optimal Clusters Using a Genetic Algorithm
Chapter 6
ABM Shawkat Ali
Clustering technique in data mining has received a significant amount of attention from machine learning community in the last few years as one of... Sample PDF
K-means Clustering Adopting rbf-Kernel
Chapter 7
Pradeep Kumar, P. Radha Krishna, Raju S. Bapi, T. M. Padmaja
In recent years, advanced information systems have enabled collection of increasingly large amounts of data that are sequential in nature. To... Sample PDF
Advances in Classification of Sequence Data
Chapter 8
Justin Zhan
To conduct data mining, we often need to collect data from various parties. Privacy concerns may prevent the parties from directly sharing the data... Sample PDF
Using Cryptography For Privacy-Preserving Data Mining
Chapter 9
Domain Driven Data Mining  (pages 196-223)
Longbing Cao, Chengqi Zhang
Quantitative intelligence based traditional data mining is facing grand challenges from real-world enterprise and cross-organization applications.... Sample PDF
Domain Driven Data Mining
Chapter 10
Model Free Data Mining  (pages 224-252)
Can Yang, Jun Meng, Shanan Zhu, Mingwei Dai
Input selection is a crucial step for nonlinear regression modeling problem, which contributes to build an interpretable model with less... Sample PDF
Model Free Data Mining
Chapter 11
John Wang, Xiaohua Hu, Dan Zhu
This research explores the effectiveness of data mining in a commercial perspective. Statistical issues are specified first. Data accuracy and... Sample PDF
Minimizing the Minus Sides of Mining Data
Chapter 12
Tu Bao Ho, Thanh Phuong Nguyen, Tuan Nam Tran
The objective of this paper is twofold. First is to provide a survey of computational methods for protein-protein interaction (PPI) study. Second is... Sample PDF
Study of Protein-Protein Interactions from Multiple Data Sources
Chapter 13
Anthony Scime, Gregg R. Murray, Wan Huang, Carol Brownstein-Evans
Immense public resources are expended to collect large stores of social data, but often these data are under-examined thereby missing potential... Sample PDF
Data Mining in the Social Sciences and Iterative Attribute Elimination
Chapter 14
Marco A. Alvarez, SeungJin Lim
Current search engines impose an overhead to motivated students and Internet users who employ the Web as a valuable resource for education. The... Sample PDF
A Machine Learning Approach for One-Stop Learning
About the Contributors