Clustering Algorithm for Arbitrary Data Sets

Yu-Chen Song; Hai-Dong Meng

doi:10.4018/978-1-59904-849-9.ch046

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Clustering Algorithm for Arbitrary Data Sets

Yu-Chen Song, Hai-Dong Meng

Source Title: Encyclopedia of Artificial Intelligence

DOI: 10.4018/978-1-59904-849-9.ch046

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Clustering analysis is an intrinsic component of numerous applications, including pattern recognition, life sciences, image processing, web data analysis, earth sciences, and climate research. As an example, consider the biology domain. In any living cell that undergoes a biological process, different subsets of its genes are expressed in different stages of the process. To facilitate a deeper understanding of these processes, a clustering algorithm was developed (Ben- Dor, Shamir, & Yakhini, 1999) that enabled detailed analysis of gene expression data. Recent advances in proteomics technologies, such as two-hybrid, phage display and mass spectrometry, have enabled the creation of detailed maps of biomolecular interaction networks. To further understanding in this area, a clustering mechanism that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes was constructed (Bader & Hogue, 2003). In the interpretation of remote sensing images, clustering algorithms (Sander, Ester, Kriegel, & Xu, 1998) have been employed to recognize and understand the content of such images. In the management of web directories, document annotation is an important task. Given a predefined taxonomy, the objective is to identify a category related to the content of an unclassified document. Self-Organizing Maps have been harnessed to influence the learning process with knowledge encoded within a taxonomy (Adami, Avesani, & Sona, 2005). Earth scientists are interested in discovering areas of the ocean that have a demonstrable effect on climatic events on land, and the SNN clustering technique (Ertöz, Steinbach, & Kumar, 2002) is one example of a technique that has been adopted in this domain. Also, scientists have developed climate indices, which are time series that summarize the behavior of selected regions of the Earth’s oceans and atmosphere. Clustering techniques have proved crucial in the production of climate indices (Steinbach, Tan, Kumar, Klooster, & Potter, 2003). In many application domains, clusters of data are of arbitrary shape, size and density, and the number of clusters is unknown. In such scenarios, traditional clustering algorithms, including partitioning methods, hierarchical methods, density-based methods and gridbased methods, cannot identify clusters efficiently or accurately. Obviously, this is a critical limitation. In the following sections, a number of clustering methods are presented and discussed, after which the design of an algorithm based on Density and Density-reachable (CADD) is presented. CADD seeks to remedy some of the deficiencies of classical clustering approaches by robustly clustering data that is of arbitrary shape, size, and density in an effective and efficient manner.

Chapter Preview

Top

Background

Clustering aims to identify groups of objects (clusters) that satisfy some specific criteria, or share some common attribute. Clustering is a rich and diverse domain, and many concepts have been developed as the understanding of clustering develops and matures (Tan, Steinbach, & Kumar, 2006). As an example, consider spatial distribution. A typology of clusters based on this includes: Well-separated clusters, Center-based clusters, Contiguity-based clusters, and Density-based clusters. Given the diversity of domains in which clustering can be applied, and the diverse characteristic and requirements of each, it is not surprising that numerous clustering algorithms have been developed. The interested reader is referred to the academic literature (Qiu, Zhang, & Shen, 2005), (Ertöz, Steinbach, & Kumar, 2003), (Zhao, Song, Xie, & Song, 2003), (Ayad & Kamel, 2003), (Karypis, Han, & Kumar, 1999) for further information.

Though the range of clustering algorithms that have been developed is broad, it is posisble to classify them according the broad approach or method adopted by each:

Key Terms in this Chapter

Contiguity-Based Clusters: Each object in a contiguity-based cluster is closer to some other object in the cluster than to any point in a different cluster.

Centre-Based Clusters: Each object in a centre-based cluster is closer to the centre of the cluster than to the centres of any other clusters.

Density-Based Clusters: Each object in a density-based cluster is closer to some other object within its Eps neighbourhood than to any object not in the cluster, resulting in dense regions of objects being surrounded by regions of lower density.

CADD: A clustering algorithm based on the concepts of Density and Density-reachable.

Cluster Analysis: Cluster analysis groups data objects based only on information found in the data that describes the objects and their relationships.

Well-Separated Cluster: A cluster is a set of objects in which each object is significantly closer (or more similar) to every other object in the cluster than to any object not in the cluster.

Eps: Maximum radius of the neighbourhood.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Clustering Algorithm for Arbitrary Data Sets

Abstract

Background

Key Terms in this Chapter

Complete Chapter List