Article Preview
TopIntroduction
The rapid advance in massive data acquisition, transmission and storage results in the growth of vast computerized datasets at unprecedented rates. These datasets come from various sectors, e.g., business, education, government, scientific community, Internet, or one of many readily available off-line and online data sources in the form of text, graphic, image, video, audio, animation, hyperlinks, markups, and so on (Li, Zhang, & Wang, 2006; Bhatnagar, Kaur, & Mignet, 2009). Moreover, they are continuously increasing and amassed in both attribute depth and scope of instances every time. Although many decisions are made on large datasets, the huge amounts of the computerized datasets have far exceeded human ability to completely interpret (Li et al., 2006). In order to understand and make full use of these data repositories when making decisions, it is necessary to develop some technique for uncovering the physical nature inside such huge datasets.
Clustering is one of the techniques to discover a segmentation rule from these data repositories. It assigns a set of objects into clusters (subsets) by virtue of their observations so that objects are similar to one another within the same cluster and are dissimilar to the objects in other clusters (Murtagh, 1983; Grabmeier & Rudolph, 2002; Xu & Wunsch, 2005; Li, Wang, & Li, 2006; Malik et al., 2010). It is an unsupervised technique without the knowledge what causes the grouping and how many groups exist (Song, Hu, & Yoo, 2009; Engle & Gangopadhyay, 2010; Silla & Freitas, 2011). The arbitrary shaped clustering was further treated (Wan, Wang, & Su, 2010). Clustering may be implemented on hierarchy, partition, density, grid, constraint, subspace and so on (Sander et al., 1998; Kwok et al., 2002; Grabmeier & Rudolph, 2002; Parsons, Haque, & Liu, 2004; Zhang et al., 2008; Horng et al., 2011).