Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Clustering Mixed Datasets Using K-Prototype Algorithm Based on Crow-Search Optimization

Lakshmi K., Karthikeyani Visalakshi N., Shanthi S., Parvathavarthini S.

Source Title: Developments and Trends in Intelligent Technologies and Smart Systems

DOI: 10.4018/978-1-5225-3686-4.ch010

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Data mining techniques are useful to discover the interesting knowledge from the large amount of data objects. Clustering is one of the data mining techniques for knowledge discovery and it is the unsupervised learning method and it analyses the data objects without knowing class labels. The k-prototype is the most widely-used partitional clustering algorithm for clustering the data objects with mixed numeric and categorical type of data. This algorithm provides the local optimum solution due to its selection of initial prototypes randomly. Recently, there are number of optimization algorithms are introduced to obtain the global optimum solution. The Crow Search algorithm is one the recently developed population based meta-heuristic optimization algorithm. This algorithm is based on the intelligent behavior of the crows. In this paper, k-prototype clustering algorithm is integrated with the Crow Search optimization algorithm to produce the global optimum solution.

Chapter Preview

Top

Introduction

Knowledge Discovery in Databases (KDD) is an automatic, exploratory analysis and modelling of large data repositories. It is the organized as the process of identifying valid, novel, useful, and understandable patterns from large and complex data sets. Data Mining is the heart of the KDD process, involving the large number of algorithms that explore the data, develop the model and discover previously unknown patterns.

Data clustering is the process of grouping the heterogeneous data objects into homogeneous clusters such that data objects within the cluster are similar with each other and dissimilar between the other clusters.

Clustering is used in variety of fields like data mining and knowledge discovery, market research, machine learning, biology, pattern recognition, weather prediction, etc. An early specific example of the use of cluster analysis in market research is given in (Green, Frank & Robinson, 1967). A large number of cities were used as test markets and the cluster analysis was used to classify the cities into a small number of groups on the basis of variables includes city size, newspaper circulation and per capita income. It shows that cities within a group is very similar to each other, choosing one city from each group was used for selecting the test markets.

Another example is, Littmann (2000) applies cluster analysis to the daily occurrences of several surface pressures for weather in the Mediterranean basin, and finds the groups that explain rainfall variance in the core Mediterranean regions. Liu and George (2005) use fuzzy k-means clustering to account for the spatiotemporal nature of weather data in the South-Central USA. Kerr and Churchill (2001) investigate the problem of clustering tools applied to gene expression data.

There are number of clustering algorithms are available for grouping the instances of the same type. The clustering algorithms are categorized into Partitional clustering algorithms, Hierarchical clustering algorithms, Density-Based clustering algorithms and Grid-Based clustering algorithms. Partitional clustering algorithms form the clusters by partition the data objects into groups. Hierarchical clustering algorithms form the clusters by the hierarchical decomposition of data objects.

The partitional clustering algorithms include k-means, k-modes, k-medoids and k-medians. The hierarchical clustering algorithms can be classified as single linkage and complete linkage, agglomerative algorithms. Density based clustering algorithms can be listed as DBSCAN, DENCLUE, OPTICS. The grid based clustering algorithms include GRIDCLUS, BANG and STING.

The k-means algorithm handles the large amount of data objects but it handles numeric type data objects. Huang introduced the two extensions of the k-means clustering algorithm. First extension is the k-modes clustering algorithm (Huang, 1997a) and second extension is the k-prototype clustering algorithm (Huang, 1997b). The k-modes algorithm efficiently handles the large amount of categorical data objects. The k-prototype algorithm efficiently handles the large amount of data objects with numeric and categorical types of data objects. This algorithm is the integration of k-means and k-modes clustering algorithms. For the mixed numeric and categorical datasets, the Euclidean distance is calculated for numeric data and the matching similarity measure is calculated for categorical data.

The k-prototype clustering algorithm selects the initial prototypes randomly from the data objects and it leads to the local optimum solution. To overcome this problem, optimization algorithm is integrated with k-prototype clustering algorithm.

Recently, there are number of optimization algorithms are introduced to obtain the global optimum solution. Some of the nature-inspired metaheuristic optimization algorithms are Genetic Algorithm (GA) (Holland, 1975; Goldberg, 1989), Ant Colony Optimization (ACO) (Dorigo, 1992), Simulated Annealing (SA) (Brooks & Morgan, 1995), Particle Swarm Optimization (PSO) (Eberhart & Kennedy, 1995), Tabu Search (TS) (Glover & Laguna, 1997), Cat Swarm Optimization (CSO) (Chu, Tsai & Pan, 2006), Artificial Bee Colony (ABC) (Basturk & Karaboga, 2006), Cuckoo Search (CS) (Yang & Deb, 2009, 2010), Gravitational Search (GS) (Rashedi, Nezamabadi-Pour & Saryazdi, 2009), Firefly Algorithm (FA) (Yang, 2010), Bat Algorithm (BA) (Yang, 2010), Wolf Search Algorithm (WSA) (Tang, Fong, Yang & Deb, 2012), Krill Herd (KH) (Gandomi & Alavi, 2012).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Clustering Mixed Datasets Using K-Prototype Algorithm Based on Crow-Search Optimization

Abstract

Introduction

Complete Chapter List