Cluster Analysis Using Rough Clustering and K-Means Clustering

Cluster Analysis Using Rough Clustering and K-Means Clustering

Kevin E. Voges (University of Canterbury, New Zealand)
DOI: 10.4018/978-1-4666-5888-2.ch160
OnDemand PDF Download:
$30.00
List Price: $37.50

Chapter Preview

Top

Introduction

Cluster analysis is a fundamental data reduction technique used in both the physical and social sciences. The extension of Rough Sets theory into cluster analysis through the techniques of Rough Clustering provides an important and potentially useful addition to the range of cluster analysis techniques available to the manager and the researcher.

Cluster analysis is defined as the grouping of “individuals or objects into clusters so that objects in the same cluster are more similar to one another than they are to objects in other clusters” (Hair, Black, Babin, & Anderson, 2009). There are a number of comprehensive introductions to cluster analysis (Abonyi & Feil, 2007; Arabie, Hubert & De Soete, 1996; Cramer, 2003; Everitt, Landau, Leese, & Stahl, 2011; Gan, Ma, & Wu, 2007). Techniques are often classified as hierarchical or nonhierarchical (Hair et al., 2009), and the most commonly used nonhierarchical technique is the k-means approach developed by MacQueen (1967). Over the past few decades, techniques based on developments in computational intelligence have been used as clustering algorithms. For example, the theory of fuzzy sets developed by Zadeh (1965), who introduced the concept of partial set membership, has been applied to clustering (Abonyi & Feil, 2007; Dumitrescu, Lazzerini, & Jain, 2000).

Fuzzy clustering has developed an extensive literature, too broad to be thoroughly reviewed here. However, two extensions will be briefly considered to demonstrate the flexibility of the technique. Atanassov (1986) extended Zadeh’s fuzzy set to a general form called an intuitionistic fuzzy set (IFS), which has been found to be more useful in dealing with uncertainty than a standard fuzzy set. Xu, Chen and Wu (2008) report an application of this IFS concept to clustering. In a second extension, Dunn (1973), and Bezdek (1981) proposed a Fuzzy C-means technique (FCM), which is one of the most commonly used objective function-based clustering techniques. Instead of assigning each object to a single cluster, class membership is relaxed by computing the membership grades using a unit interval. As will be seen below, this has similarities to clustering using rough sets. Izakian and Pedrycz (2014) developed an extension to the FCM, where the distance function is given adjustable weight parameters, quantifying the impact coming from blocks of features rather than from individual features. They also show the increased use of hybridization techniques (explored later in this article), using particle swarm optimization to optimize the weights. Genetic algorithms have also been applied to clustering tasks (Maulik, Bandyopadhyay, & Mukhopadhyay, 2011).

Another technique receiving considerable attention is the theory of rough sets (Pawlak, 1982), which has led to clustering algorithms referred to as rough clustering (do Prado, Engel, & Filho, 2002; Kumar, Krishna, Bapi, & De, 2007; Lingras & Peters, 2011; Parmar, Wu, & Blackhurst, 2007; Voges, Pope, & Brown, 2002).

This article provides brief introductions to k-means cluster analysis, rough sets theory, and rough clustering, and compares k-means clustering and rough clustering. The article shows that rough clustering provides a more flexible solution to the clustering problem, and can be conceptualized as extracting concepts from the data, rather than strictly delineated subgroupings (Pawlak, 1991). Traditional clustering methods generate extensional descriptions of groups (i.e. which objects are members of each cluster), whereas clustering techniques based on rough sets theory generate intensional descriptions (i.e. what are the main characteristics of each cluster) (do Prado et al., 2002). These different goals suggest that both k-means clustering and rough clustering have their place in the data analyst’s and the information manager’s toolbox.

Key Terms in this Chapter

Market Segmentation: Market segmentation is a central concept in marketing theory and practice, and involves identifying homogeneous sub-groups of buyers within a heterogeneous market. It is most commonly conducted using cluster analysis of the measured demographic or psychographic characteristics of consumers. Forming groups that are homogenous with respect to these measured characteristics segments the market.

Cluster Analysis: A data analysis technique involving the grouping of objects into sub-groups or clusters so that objects in the same cluster are more similar to one another than they are to objects in other clusters.

K-Means Clustering: A cluster analysis technique in which clusters are formed by randomly selecting k data points as initial seeds or centroids, and the remaining data points are assigned to the closest cluster on the basis of the distance between the data point and the cluster centroid.

Rough Set: The concept of rough, or approximation, sets was introduced by Pawlak, and is based on the single assumption that information is associated with every object in an information system. This information is expressed through attributes that describe the objects, and objects that cannot be distinguished on the basis of a selected attribute are referred to as indiscernible. A rough set is defined by two sets, the lower approximation and the upper approximation.

Complete Chapter List

Search this Book:
Reset