Performance Analysis of Hard and Soft Clustering Approaches For Gene Expression Data

Performance Analysis of Hard and Soft Clustering Approaches For Gene Expression Data

P. K. Nizar Banu (Department of Computer Applications, B.S. Abdur Rahman University, Chennai, India) and S. Andrews (Department of Information Technology, Mahendra Engineering College, Namakkal, India)
Copyright: © 2015 |Pages: 12
DOI: 10.4018/ijrsda.2015010104
OnDemand PDF Download:
No Current Special Offers


Mining gene expression data is growing rapidly to predict gene expression patterns and assist clinicians in early diagnosis of tumor formation. Clustering gene expression data is the most important phase, helps in finding group of genes that are highly expressed and suppressed. This paper analyses the performance of most representative hard and soft off-line clustering algorithms: K-Means, Fuzzy C-Means, Self Organizing Maps (SOM) based clustering and Genetic Algorithm (GA) based clustering for brain tumor gene expression dataset. Clusters produced by the clustering algorithms are the indications of the cellular processes. Clustering results are evaluated using clustering indices such as Xie-Beni index (XB), Davies-Bouldin index (DB), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Dunn's Index (DI) along with the time taken to find the compactness and separation of clusters. Experimental results prove soft clustering approaches works well to predict clusters of highly expressed and suppressed genes.
Article Preview

Research Background

Significance of the clustering process is its ability to cluster high-dimensional data which implies the advancement of microarray technology to measure the expression levels of sets of genes across different conditions. In clustering, the data consists of gene expression values alone. If cluster analysis is used as a descriptive or exploratory tool, it is possible to try several algorithms on the same data to see what the data may disclose (Han & Kamber, 2001). Clustering gene expression patterns can be classified into two broad categories such as gene clustering and sample clustering. In gene based clustering, genes are treated as objects and samples as attributes. Gene based clustering is used to reveal the similarity between genes or a set of genes with similar conditions that leads to identify differentially expressed genes and to generate a list of expression patterns. Sample based clustering can be performed to find the structure of the phenotype or substructure of the samples. In this case, samples are treated as objects and genes as attributes.

Many clustering algorithms have been proposed by researches. Partitioning and hierarchical clustering are two main approaches for clustering. Clustering techniques are applied in many application areas such as Pattern Recognition (Tan et al., 2005), Data Mining (Xiong & Tan, 2004), Machine Learning (Alpaydin, 2004), etc in order to group related genes in one cluster so that genes within the same cluster are similar to each other and different from genes in other clusters. Clustering algorithms can be broadly classified as Hard, Fuzzy, Possibilistic and Probabilistic (Hathway & Bezdek, 1995).

Complete Article List

Search this Journal:
Volume 8: 1 Issue (2022): Forthcoming, Available for Pre-Order
Volume 7: 4 Issues (2021): 1 Released, 3 Forthcoming
Volume 6: 3 Issues (2019)
Volume 5: 4 Issues (2018)
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 2 Issues (2015)
Volume 1: 2 Issues (2014)
View Complete Journal Contents Listing