Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Genetically-Modified K-Medoid Clustering Algorithm for Heterogeneous Data Set

Dhayanithi Jaganathan, Akilandeswari Jeyapal

Source Title: Handbook of Research on Applications and Implementations of Machine Learning Techniques

DOI: 10.4018/978-1-5225-9902-9.ch004

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In recent days, researchers are doing research studies for clustering of data which are heterogeneous in nature. The data generated in many real-world applications like data form IoT environments and big data domains are heterogeneous in nature. Most of the available clustering algorithms deal with data in homogeneous nature, and there are few algorithms discussed in the literature to deal the data with numeric and categorical nature. Applying the clustering algorithm used by homogenous data to the heterogeneous data leads to information loss. This chapter proposes a new genetically-modified k-medoid clustering algorithm (GMODKMD) which takes fused distance matrix as input that adopts from applying individual distance measures for each attribute based on its characteristics. The GMODKMD is a modified algorithm where Davies Boudlin index is applied in the iteration phase. The proposed algorithm is compared with existing techniques based on accuracy. The experimental result shows that the modified algorithm with fused distance matrix outperforms the existing clustering technique.

Chapter Preview

Top

Introduction

The nature of data with high fluctuation and different characteristics are called heterogeneous data. In general, integrating the heterogeneous data is very difficult to meet the business information requirements. In recent days the data generated from IoT are often heterogeneous nature. The heterogeneous data are further classified into four different characteristics namely, numeric, binary, nominal and ordinal. The data are in measurable form or numeric forms are called numerical data. The data that falls on two states 0 or 1 are called binary data. The data which simply names or label something without any ordered is called Nominal data. Ordinal data are extension of nominal data is follows an order. Apart from the characteristics of the data, it also important to know much about those data is in the form of metadata management. For the better interpretation of heterogeneous data detailed metadata information are required. In many cases it is difficult to collect those metadata.

Grouping objects into similar clusters is the prime motive of any clustering techniques. Similarity or Dissimilarity is measured by how far the objects are close enough together. Majority of similarity measure have been studied and tested in the literature. The similarity measure are falls in two categorized first the data with numerical value and second the data with conceptually categorical. The similarity measures available for one type of data are not suitable for other type of data. The challenges of clustering heterogeneous data concentrate on designing in tackling the difficulties raised by complex and dynamic characteristics, volume of data, and defining the good similarity measure to know the similarity between the objects in order to group them together. More focused research on similarity or dissimilarity measure for heterogeneous dataset was already carried out by many researchers. Study is needed for defining perfect similarity or dissimilarity measures of heterogeneous types.

Machine learning is the design of algorithms that permit machines to develop behaviors based on empirical data. Most of research work carried out in machine learning is that make the computer to automatically learn by themselves. Machine Learning is defined as any algorithm can learn by themselves based on the Experience (E) obtained from certain Task (T) and the Performance measure (P) of that Task T is keep on improving by their Experience (E). Based on the outcome of the algorithm machine learning can be classified in to two types namely, supervised and unsupervised. In supervised learning, function generates to map the input to desired outputs and in unsupervised learning, a set of inputs were modeled like clustering. Machine Learning is performed by various strategies and techniques namely, Inductive Logic Programming, Simulated Annealing, Neural Nets and Evolutionary Strategies. The first three techniques are beyond the scope of this chapter and an only evolutionary strategy is currently focused.

Genetic Algorithm is a heuristic search which is widely used in search optimization and finding the optimal solution based on natural evolution. Genetic Algorithm is a subset of evolutionary algorithm in which the offspring of the next generation is incurred by fittest individuals of current generation. Genetic Algorithm comprises of five phases namely, Population Initialization, fitness function, selection, cross over and mutation. It is necessary to incorporate Genetic Algorithm with clustering techniques because clustering is the key task in the process of acquiring knowledge. The cluster analysis is usually observed by measuring the natural association of members in the clusters i.e., the natural association of members within the group is high compared to the members in different group. Even for a small set of elements (25) to be clustered in small set of groups (5) arise a very large number of possibilities (2,436,684,974,110,751). The clustering task is incorporated with Genetic Algorithm leads to minimize the within cluster variance.

This paper focuses on two aspects, 1) Formulate the fused distance matrix for the heterogeneous data types and 2) Genetically modified K-Mediod clustering algorithm with modified Davies Bouldin Index as the fitness function.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Genetically-Modified K-Medoid Clustering Algorithm for Heterogeneous Data Set

Abstract

Introduction

Complete Chapter List