Save 10% on All IGI Global Research Books
& OnDemand Individual Chapter & Article DownloadsAvailable exclusively on IGI Global’s Online Bookstore. Offer valid through October 31, 2024

Special Offers
- Save 10% on the IGI Global Online bookstore
  Now through October 31, 2024, save 10% on all IGI Global research books & OnDemand individual chapter & article downloads. IGI Global contributors may stack this discount with their exclusive 50% contributor discount, which is automatically applied when logged into a contributor portal account. Non-contributors may also combine the discount with one other discount, including coupon codes. Not valid on open access processing charges, e-collections, or videos. Discount is not applicable for distributors.
  Explore Books & Chapters
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Clustering Techniques: A Review on Some Clustering Algorithms

Harendra Kumar

Source Title: Emerging Trends and Applications in Cognitive Computing

DOI: 10.4018/978-1-5225-5793-7.ch009

OnDemand:

(Individual Chapters)

Available

$33.75

List Price: $37.50

Current Special Offers

10% Discount:-$3.75

TOTAL SAVINGS: $3.75

Abstract

Clustering is a process of grouping a set of data points in such a way that data points in the same group (called cluster) are more similar to each other than to data points lying in other groups (clusters). Clustering is a main task of exploratory data mining, and it has been widely used in many areas such as pattern recognition, image analysis, machine learning, bioinformatics, information retrieval, and so on. Clusters are always identified by similarity measures. These similarity measures include intensity, distance, and connectivity. Based on the applications of the data, different similarity measures may be chosen. The purpose of this chapter is to produce an overview of much (certainly not all) of clustering algorithms. The chapter covers valuable surveys, the types of clusters, and methods used for constructing the clusters.

Chapter Preview

Top

Introduction

Clustering is the task of dividing the set of data points (populations) into a number of groups (clusters) such that data points in the same groups are more similar to other in the same group than those data points in other groups. Or clustering is a process of organizing data points into groups whose members are similar in some way. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. Clustering can be considered the most important unsupervised learning problem as it finds a structure in a collection of unlabeled data. Such grouping is pervasive in the way human process information, and one of the motivations for using clustering algorithms is to provide automated tools to help in constructing categories or taxonomies. These formed clusters are used as the basis for the data analysis (or data processing techniques). The formed clusters include dense areas of the data space, groups with small distances, cluster data members, intervals or particular statistical parameter. Therefore, a data clustering can be formulated as a multi-objective optimization problem. The selection of appropriate clustering algorithm and parameter settings depend on the types of data set and intended used for the results.

Almost all clustering problem is NP hard for almost all clustering objective functions. Distance-based and density-based are two of many categories of clustering algorithms. The distance-based clusters are formed by adding points that minimize intra-cluster distances and maximize inter-cluster distances. The radius and diameter of clusters can be used as intra-cluster characteristics. Density based clustering help searches for signals of unknown shape. In a density-based clustering, a cluster is connected dense component which can grow in any direction to increase the density. Density based algorithm looks for neighbours of those data points that have at least a given number of neighbouring points within a given distance on the plane and forms clusters of data-points that can be related through their common neighbours.

Clustering algorithms work very differently so it is very difficult to conclude which algorithm is the best without examining the formed data clusters. Besides choosing the right clustering algorithm, choosing the right features also plays a critical role in clustering. Moreover, there are no universally accepted and effective criteria for selecting the clustering schemes and valid features. Although, validation criteria can provide some insights on the quality of clustering solutions but even how to choose the appropriate criterion is still a problem requiring more research to do. Han and Kamber (2001) have given a very good introduction to contemporary data mining clustering techniques in their text book. Genther et al. (1994) presented a modified fuzzy clustering algorithm for parametric defuzzification in fuzzy rule base systems. Some recent defuzzification methods have been discussed by Kumar (2017). The general discussion about hierarchical clustering is available in most of the clustering books. Zahn (1971) has discuss about divisive hierarchical clustering techniques that uses the minimum spanning tree of graph. Leung et al. (2000) have derived an interesting hierarchical clustering technique which is based on scale-space theory by using a blurring process, in which each datum is regarded as a light point in an image and a cluster is represented as a blob. Morzy et al. (1999) introduce a hierarchical algorithm that uses sequential patterns as the basic element found in the database to efficiently generate data clusters and defined a co-occurrence measure, as the standard of fusion of smaller clusters. Selim and Ismail (1984) gave rigorous proof of the finite convergence of the K-means type algorithms.

Purposes of Clustering

The quality of the data clustering result depends on its implementation and similarity measure used by the method. Also, the quality of a good clustering technique is measured by its ability to find some or all of the hidden patterns. Some purposes of a good clustering are:

1.
To analyze the structure of the data;
2.
To assist in classification designing
3.
To relate different aspects of the data to each other.
4.
To shape and keep knowledge.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Clustering Techniques: A Review on Some Clustering Algorithms

Abstract

Introduction

Purposes of Clustering

Complete Chapter List