MVClustViz: A Novice Yet Simple Multivariate Cluster Visualization Technique for Centroid-based Clusters

MVClustViz: A Novice Yet Simple Multivariate Cluster Visualization Technique for Centroid-based Clusters

Sagar S. De (S. N. Bose National Center for Basic Sciences, Kolkata, West Bengal, India), Minati Mishra (Department of Information and Communication Technology, Fakir Mohan University, Balasore, Odisha, India) and Satchidananda Dehuri (Department of Systems Engineering, Ajou University, Suwon, South Korea)
Copyright: © 2013 |Pages: 14
DOI: 10.4018/ijsda.2013100102
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In the visual data mining, visualization of clusters is a challenging task. Although lots of techniques already have been developed, the challenges still remain to represent large volume of data with multiple dimension and overlapped clusters. In this paper, a multivariate clusters visualization technique (MVClustViz) has been presented to visualize the centroid-based clusters. The geographic projection technique supports multi-dimension, large volume, and both crisp and fuzzy clusters visualization. This technique is most suitable for range analysis of defense related data.
Article Preview

Introduction

Centroid-based clustering has a long history in numerical taxonomy and has been considered as one of the heavily used technique in exploratory data mining. Clustering (Jain, 2010) has become one of the common techniques for statistical data analysis and applied in many fields such as machine learning, pattern recognition, information retrieval, image analysis, bio-informatics, computational finance, systems engineering, and social networking. Cluster analysis involves finding a specific number (K) of subgroups, known as clusters, representing high intra-cluster homogeneity and increasing inter-cluster dissimilarity within a set of N observations (data points / samples / objects); where each sample S is described by D features. In centroid-based clustering, each cluster CK is represented using a center vector of size D, representing center of the cluster, which may not necessarily be a member of observed data points; and each observation is assigned to one of the clusters (exclusive assignment or crisp/hard assignment) or in part to many clusters (partial assignment or fuzzy assignment or soft assignment). Mathematically, a K-clustering of a data set X = {x1, . . ., xN} is the partition of X into K sets (clusters), C1, ...,CK such that the following three conditions are met:

  • 1.

    Ci ≠ ϕ, i = 1, ...,K

    • 2.­

      • 3.

        Ci ∩ Cj = ϕ, i ≠ j, i,j = 1, ...,K.

This is known as hard clustering of K-clusters. In fuzzy clustering the data items are assigned membership values for each cluster [0 1]. Additionally, in fuzzy clustering with K clusters Ci ∩ Cj ≠ ϕ, i ≠ j, i,j = 1, ...,K (i.e., a sample point may belongs to every clusters with a certain degree of membership).

Centroid-based clusters can be generated by some of the well-known techniques such as K-means: representing each cluster by a single mean vector (Hartigan & Wong, 1979; Jain, 2010; Krishna & Narasimha Murty, 1999); K-medoids: restricting the centroids to members of the data points (Park & Jun, 2009; Rousseeuw & Kaufman, 1990); K-medians: choosing medians (Anderson et al., 2008); K-means++: choosing the initial centers less randomly (Arthur & Vassilvitskii, 2007); or fuzzy K-means: allowing a fuzzy cluster assignment (De Oliveira & Pedrycz, 2007; Kruse et al., 2007).

Clustering techniques are considered as effective knowledge exploration techniques but at the same time understanding the relation between the generated clusters and it’s data points are also important. Real life data are most often represented in high dimensional space and hence the inherent similarities are hard to recognize and illustrate. This fact makes it a challenging task to build tools which can visualize the similarities and relationships between features. By nature visualization requires a mapping process from a high-dimensional input space to low-dimensional output space.

In this paper, existing cluster visualization and knowledge exploration techniques have been presented first, than we introduce a new visualization technique (MVClustViz) with the help of traditional bar visualization. MVClustViz is capable to visualize large-scale, multidimensional datasets in a single view, and able to produce quick overview about the dataset. The technique is also capable to visualize complete information of overlapped clusters. Further, fuzzy clusters can be visualized using this technique without a predefined fuzzy cutup value. Therefore, it increases the flexibility of analysis. We have discussed different visualization possibilities for MVClustViz and lastly we have discussed few results of interpolation techniques.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 7: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 6: 4 Issues (2017)
Volume 5: 4 Issues (2016)
Volume 4: 4 Issues (2015)
Volume 3: 4 Issues (2014)
Volume 2: 4 Issues (2013)
Volume 1: 4 Issues (2012)
View Complete Journal Contents Listing