Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Data Clustering Algorithms Using Rough Sets

B.K. Tripathy, Adhir Ghosh

Source Title: Handbook of Research on Computational Intelligence for Engineering, Science, and Business

DOI: 10.4018/978-1-4666-2518-1.ch012

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Developing Data Clustering algorithms have been pursued by researchers since the introduction of k-means algorithm (Macqueen 1967; Lloyd 1982). These algorithms were subsequently modified to handle categorical data. In order to handle the situations where objects can have memberships in multiple clusters, fuzzy clustering and rough clustering methods were introduced (Lingras et al 2003, 2004a). There are many extensions of these initial algorithms (Lingras et al 2004b; Lingras 2007; Mitra 2004; Peters 2006, 2007). The MMR algorithm (Parmar et al 2007), its extensions (Tripathy et al 2009, 2011a, 2011b) and the MADE algorithm (Herawan et al 2010) use rough set techniques for clustering. In this chapter, the authors focus on rough set based clustering algorithms and provide a comparative study of all the fuzzy set based and rough set based clustering algorithms in terms of their efficiency. They also present problems for future studies in the direction of the topics covered.

Chapter Preview

Top

Introduction

A cluster is a collection of data objects that are similar to one another. The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering. So, it has largely been used as a data analysis tool to characterize data sets. It has been used in data mining tasks such as unsupervised classification and data summation, as well as segmentation of large heterogeneous data sets into smaller homogeneous subsets that can be easily managed, separately modeled and analyzed (Huang 1998). Cluster Analysis has been widely used in numerous applications, including market research, pattern recognition, image processing, research and development, nuclear science, medicine and in business. In business, for example clustering can help marketers discover distinct groups in their customer bases and characterize customer groups based on their purchasing patterns.

There are several applications of cluster analysis. We only name a few of them here. Jiang et al (2004) analyze a variety of cluster techniques for complex gene expression data. Wu et al (2004) have developed a clustering algorithm specifically designed to handle the complexities of gene data that can estimate the correct number of clusters and find them. Wong et al (2002) present an approach used to segment tissues in a nuclear medical imaging method known as positron emission tomography (PET). Mathieu and Gibson (2004) use cluster analysis as a part of a decision support tool for large-scale research and development planning to identify programs to participate in and to determine resource allocation. Haimov et al (1989) use cluster analysis to segment radar signals in scanning land and marine objects. Saglam et al. expressed the clustering problem in the form of a mixed-integer programming problem with the objective of minimizing the maximum cluster diameter among all clusters. This was applied to solve the customer segmentation problem of a digital platform company involving demographic and transactional attributes related to the customers. Fathian et al proposed a hybridization of nature inspired intelligent technique with K-means algorithm. Chen and Liu (2009) proposed an effective clustering algorithm, which was used to resolve the classification problem of construction management.

Cluster analysis is a challenging field of research as it is applied in several diverse fields. Clustering is sometimes called data segmentation as it divides a data set into several groups depending upon the similarity of individual elements. In data mining, clustering needs to possess certain characteristics like scalability, handling of hybrid data, generating clusters with random shape, handling missing values, parameter identification, handling of dynamic updation of data values and dealing with large number of attributes.

In conventional clustering the data with similar characteristics are grouped together to form a single cluster However, in practice it has been observed that this requirement is very stringent and objects may show characteristics to belong to several clusters. In such cases an object may belong to more than one clusters leading to overlapping of clusters instead of them being distinct. In order to handle such situations multiple memberships of objects became a necessity. This led to the development of clustering algorithms using fuzzy techniques. In later developments rough set techniques were used in developing such algorithms, which also handles uncertainty of data in an efficient manner. Rough set based clustering provides a solution that is less restrictive than conventional clustering and less descriptive than fuzzy clustering (Lingras et al 2003).

In this chapter, we discuss briefly on the chronological development of different clustering algorithms, starting from conventional to fuzzy based ones. The primary focus of the chapter is to discuss the rough set based algorithms in detail, provide a comparative analysis of these algorithms and compare their efficiency with other fuzzy based algorithms. Also, we provide some directions of research for further study.

Key Terms in this Chapter

Convergence of a Process: A process which terminates after a finite number of steps is said to be convergent and phenomenon is called the convergence of a process.

Purity Ratio: A ratio which measures as how many elements are classified correctly (assigned to their original clusters) out of the total number of objects.

Equivalence Relation: A relation which satisfies the properties of reflexive, symmetric and transitive. These relations decompose a set on which they are defined into disjoint subsets called equivalence classes.

Hybrid Techniques: A composition of two more primary techniques

Artificial Objects: Objects which do not have physical presence in a database.

Missing Values: The attribute values which are not there in a database for some reason like not entered, not available or deleted.

Dissimilarity Measure: The measures which compute the differences in values of tuples in a database.

Stability of Clustering Solutions: The solution clusters obtained as a result of the process of clustering which does not change when the process is repeated.

Clustering: It is the process of dividing a dataset into groups of similar data items.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Data Clustering Algorithms Using Rough Sets

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List