TopIntroduction
Cluster analysis is a fundamental data reduction technique used in both the physical and social sciences. The extension of Rough Sets theory into cluster analysis through the techniques of Rough Clustering provides an important and potentially useful addition to the range of cluster analysis techniques available to the manager and the researcher.
Cluster analysis is defined as the grouping of “individuals or objects into clusters so that objects in the same cluster are more similar to one another than they are to objects in other clusters” (Hair, Black, Babin, & Anderson, 2009). There are a number of comprehensive introductions to cluster analysis (Abonyi & Feil, 2007; Arabie, Hubert & De Soete, 1996; Cramer, 2003; Everitt, Landau, Leese, & Stahl, 2011; Gan, Ma, & Wu, 2007). Techniques are often classified as hierarchical or nonhierarchical (Hair et al., 2009), and the most commonly used nonhierarchical technique is the k-means approach developed by MacQueen (1967). Over the past few decades, techniques based on developments in computational intelligence have been used as clustering algorithms. For example, the theory of fuzzy sets developed by Zadeh (1965), who introduced the concept of partial set membership, has been applied to clustering (Abonyi & Feil, 2007; Dumitrescu, Lazzerini, & Jain, 2000).
Fuzzy clustering has developed an extensive literature, too broad to be thoroughly reviewed here. However, two extensions will be briefly considered to demonstrate the flexibility of the technique. Atanassov (1986) extended Zadeh’s fuzzy set to a general form called an intuitionistic fuzzy set (IFS), which has been found to be more useful in dealing with uncertainty than a standard fuzzy set. Xu, Chen and Wu (2008) report an application of this IFS concept to clustering. In a second extension, Dunn (1973), and Bezdek (1981) proposed a Fuzzy C-means technique (FCM), which is one of the most commonly used objective function-based clustering techniques. Instead of assigning each object to a single cluster, class membership is relaxed by computing the membership grades using a unit interval. As will be seen below, this has similarities to clustering using rough sets. Izakian and Pedrycz (2014) developed an extension to the FCM, where the distance function is given adjustable weight parameters, quantifying the impact coming from blocks of features rather than from individual features. They also show the increased use of hybridization techniques (explored later in this article), using particle swarm optimization to optimize the weights. Genetic algorithms have also been applied to clustering tasks (Maulik, Bandyopadhyay, & Mukhopadhyay, 2011).
Another technique receiving considerable attention is the theory of rough sets (Pawlak, 1982), which has led to clustering algorithms referred to as rough clustering (do Prado, Engel, & Filho, 2002; Kumar, Krishna, Bapi, & De, 2007; Lingras & Peters, 2011; Parmar, Wu, & Blackhurst, 2007; Voges, Pope, & Brown, 2002).
This article provides brief introductions to k-means cluster analysis, rough sets theory, and rough clustering, and compares k-means clustering and rough clustering. The article shows that rough clustering provides a more flexible solution to the clustering problem, and can be conceptualized as extracting concepts from the data, rather than strictly delineated subgroupings (Pawlak, 1991). Traditional clustering methods generate extensional descriptions of groups (i.e. which objects are members of each cluster), whereas clustering techniques based on rough sets theory generate intensional descriptions (i.e. what are the main characteristics of each cluster) (do Prado et al., 2002). These different goals suggest that both k-means clustering and rough clustering have their place in the data analyst’s and the information manager’s toolbox.