Harmony Search PSO Clustering for Tumor and Cancer Gene Expression Dataset

Harmony Search PSO Clustering for Tumor and Cancer Gene Expression Dataset

P. K. Nizar Banu (Department of Computer Applications, B.S. Abdur Rahman University, Chennai, India) and S. Andrews (Department of Information Technology, Mahendra Engineering College, Namakkal India, India)
Copyright: © 2014 |Pages: 21
DOI: 10.4018/ijsir.2014070101
OnDemand PDF Download:
List Price: $37.50


Enormous quantity of gene expression data from diverse data sources are accumulated due to the modern advancement in microarray technology that leads to major computational challenges. The foremost step towards addressing this challenge is to cluster genes which reveal hidden gene expression patterns and natural structures to find the interesting patterns from the underlying data that in turn helps in disease diagnosis and drug development. Particle Swarm Optimization (PSO) technique is extensively used for many practical applications but fails in finding the initial seeds to generate clusters and thus reduces the clustering accuracy. One of the meta-heuristic optimization algorithms called Harmony Search is free from divergence and helps to find out the near-global optimal solutions by searching the entire solution space. This paper proposes a novel Harmony Search Particle Swarm Optimization (HSPSO) clustering algorithm and is applied for Brain Tumor, Colon Cancer, Leukemia Cancer and Lung Cancer gene expression datasets for clustering. Experimental results show that the proposed algorithm produces clusters with better compactness and accuracy, in comparison with K-means clustering, PSO clustering (swarm clustering) and Fuzzy PSO clustering.
Article Preview


Rapid advent of DNA microarray technology revolutionized gene expression analysis. Gene expression data is typically arranged in a data matrix, with thousands of rows corresponding to genes and hundreds of columns representing various samples such as tissues or experimental conditions for mining functional and class information. Conditions can be different environmental conditions or different time points corresponding to one or more environmental condition. The (m, n)th entry of the gene expression matrix represents the expression level of the gene corresponding to row m under the specific condition corresponding to column n. One of the promising methods used to investigate the underlying structure of gene expression dataset is cluster analysis (Eisen et al., 1998; Tavazoie et al., 1999; Dhillon et al., 2003).

Clustering is an interesting approach for finding similarities in data and adding similar data into groups. Initial step in the analysis of gene expression data is the detection of clusters of genes showing similar expression activity over the set of conditions. In gene expression, elements are usually genes and the vector of each gene is its expression pattern. Patterns that are similar are grouped in same cluster, while the patterns that are different are placed in different clusters. Conditions may also be clustered, enabling disease types such as cancers to be defined in terms of their unique expression profiles (Pomeroy et al., 2002). Clustering microarray gene expression data helps to understand the gene functions, gene regulation and cellular processes (Daxin Jiang et al., 2004). Clustering of tissue samples collected from various people including healthy persons and those affected by cancer, based on the similarity in expression patterns, can help in effective classification of unknown samples which in turn can lead in the early diagnosis of diseases (Marcilio et al., 2008).

The K-means algorithm (Tou & Gonzalez, 1974) is most widely used and is effective in producing clusters for many practical applications. The main disadvantage of the original K-means is its convergence to local minima and computational complexity. In order to overcome the performance of K-means, some evolutionary algorithm based K-means (Du et al., 2008; Sun, 2012) are proposed, such as PK-Means (Du et al., 2008), which integrates Particle Pair Optimizer (Du et al., 2008) and K-means clustering algorithm. Particle Swarm Optimization (PSO) is an evolutionary computation technique which finds optimum solution in many applications. A quantum behaved PSO, integrated with one-step K-means operation is proposed in (Sun, 2012). In literature, meta-heuristic optimization algorithms are used effectively in clustering problem as it converges to global minima. These algorithms use two basic strategies while searching for the global optimum; exploration and exploitation (Rashedi et al., 2009). While the exploration process succeeds in enabling the algorithm to reach the best local solutions within the search space, the exploitation process expresses the ability to reach the global optimum solution which is likely to exist around the local solutions obtained. An enhanced cluster matching approach is proposed by (Yau-King et al., 2013) to improve the PSO based K-means algorithm.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing