Article Preview
TopResearch Background
Significance of the clustering process is its ability to cluster high-dimensional data which implies the advancement of microarray technology to measure the expression levels of sets of genes across different conditions. In clustering, the data consists of gene expression values alone. If cluster analysis is used as a descriptive or exploratory tool, it is possible to try several algorithms on the same data to see what the data may disclose (Han & Kamber, 2001). Clustering gene expression patterns can be classified into two broad categories such as gene clustering and sample clustering. In gene based clustering, genes are treated as objects and samples as attributes. Gene based clustering is used to reveal the similarity between genes or a set of genes with similar conditions that leads to identify differentially expressed genes and to generate a list of expression patterns. Sample based clustering can be performed to find the structure of the phenotype or substructure of the samples. In this case, samples are treated as objects and genes as attributes.
Many clustering algorithms have been proposed by researches. Partitioning and hierarchical clustering are two main approaches for clustering. Clustering techniques are applied in many application areas such as Pattern Recognition (Tan et al., 2005), Data Mining (Xiong & Tan, 2004), Machine Learning (Alpaydin, 2004), etc in order to group related genes in one cluster so that genes within the same cluster are similar to each other and different from genes in other clusters. Clustering algorithms can be broadly classified as Hard, Fuzzy, Possibilistic and Probabilistic (Hathway & Bezdek, 1995).