Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

A Study on Efficient Clustering Techniques Involved in Dealing With Diverse Attribute Data

Pragathi Penikalapati, A. Nagaraja Rao

Source Title: Pattern Recognition Applications in Engineering

DOI: 10.4018/978-1-7998-1839-7.ch006

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The compatibility issues among the characteristics of data involving numerical as well as categorical attributes (mixed) laid many challenges in pattern recognition field. Clustering is often used to group identical elements and to find structures out of data. However, clustering categorical data poses some notable challenges. Particularly clustering diversified (mixed) data constitute bigger challenges because of its range of attributes. Computations on such data are merely too complex to match the scales of numerical and categorical values due to its ranges and conversions. This chapter is intended to cover literature clustering algorithms in the context of mixed attribute unlabelled data. Further, this chapter will cover the types and state of the art methodologies that help in separating data by satisfying inter and intracluster similarity. This chapter further identifies challenges and Future research directions of state-of-the-art clustering algorithms with notable research gaps.

Chapter Preview

Top

Introduction

In this web 3.0 era, information is accelerating from the Big data and with the applicability of IoT devices. Each day, information is generated from the use of Social sites to wearable devices and IOT’s. For instance, millions of google searchers, thousands of YouTube video uploads, data from service-based applications (like medical, transport, logistics, education, shopping sites, etc.), tweets, comments are being generated. With this information, researchers are trying to extract the patterns that help in analyzing and understanding data. However, this raw data cannot be analyzed using any algorithm. Often in real-time situations, data is not available with any appropriate classifications and pre-defined labels. Consequently, there is a need to develop certain models of machine learning capable of precise classification of the new data, based on certain similarities in features. This process can be accomplished by ‘Clustering’, an algorithm of unsupervised learning. In machine learning () as well as data mining, this analysis of clustering is a very important technique. The objective of clustering analysis is to segregate an ensemble of undefined objects into diverse clusters in such a way that the data objects of a specific cluster are either different or similar to the data objects of another cluster. The applications of cluster analysis are numerous including the categorization of customers, setting market targets, analysis of social networks, bioinformatics, and analysis of scientific data (Han & Kamber, 2000). Segmentation of a specific dataset into a homogeneous collection is performed by an optimization model of partitioning depicted by a cost function, in such a way that there is a similarity among the observations inside a cluster, while dissimilarity among the observations of other clusters. Input: An unlabeled training set with attribute values D={observations, i=1,…,N} with N objects described by d attributes where observations= {Attribute_1, Attribute_2,…, Attribute_d} ÎR^d K depicts the total number of initial clusters. Output: A set of K clusters C₁, C₂,…, C_k. The variations in size, shape, and density in the resultant clusters largely depend on the number of clusters K and the processes of clustering adopted. The prime characteristic feature of a good clustering is in its intense compactness, which means that the intra-cluster observations should be as proximate as can be possible, and isolation which means that the inter-cluster variations in observations should be as scattered as can be possible.

As illustrated in Figure 1, it can be noted that clusters C1 and C3 are different in shape but very compact, while C2 and C3 are comparatively not so compact. Certain observations are found to be secluded from the cluster’s core. Such secluded observations could be the representations of the outlier and noise in the resultant clusters and may not be causing a negative influence on the comprehension towards the close of the process (Ben Salem, Naouali, & Chtourou, 2018).

Figure 1.

Segmenting observations into groups based on inter and intra distance

d1 is the representation of the inter-distance existing between the observations associated with diverse clusters requiring to be maximized. This results in the procurement of isolated clusters. Similarly, d2 depicts the intra-distance existing between the observations pertaining to the same cluster requiring to be minimized. This results in the procurement of compact clusters.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

A Study on Efficient Clustering Techniques Involved in Dealing With Diverse Attribute Data

Abstract

Introduction

Complete Chapter List