Article Preview
TopIntroduction
DNA microarray technology is a fundamental tool in the study of gene expression data analysis. The accumulation of datasets from this technology that measures the relative abundance of mRNA of thousands of genes across tens or hundreds of samples has underscored the need for quantitative analytical tools to examine such data. Due to the large number of genes and complex gene regulation networks, clustering is a useful exploratory technique for analyzing these data. Clustering divides the data of interest into a small number of relatively homogeneous groups or clusters. There are two ways of applying cluster analysis to microarray data. One way is to cluster genes according to their expression patterns across different conditions. The other way is to cluster samples from different tissues, cells at different time points of a biological process or under different treatments (Chen et al., 2002). Gene expression profiles can be built by measuring transcription levels of genes in an organism under various conditions, at different developmental stages and in different tissues that characterizes the dynamic functioning of each gene in genome (Alvis & Vilo, 2000). These gene expression data in microarray are presented in M X N matrix where M is the number of microarray experiments and N being the number of genes (Tuzhilin & Adomavicius, 2002). Certain analysis needs to be performed on this gene expression data to retrieve useful biological information. Cluster analysis is one such technique which discovers useful biological information by detecting genes that have identical expression profile (Kotala et al., 2001). A wide variety of clustering algorithms are available for clustering gene expression data (Bezdek, 1981). Researchers introduced a number of clustering algorithms, based on the characteristics of the clustering procedure; clustering algorithms are classified into two broad categories namely partitional and hierarchical clustering. Grid-based clustering (Liao et al., 2004), projection based clustering (Bouguessa & Wang, 2009), subspace clustering (Agrawal et al., 1998), density based clustering (Ester et al., 1996), model based methods, graph theoretic methods and soft computing methods are the other clustering algorithms that are presented in the literature.
In the recent years, optimization algorithms are also introduced for clustering process. In optimization based clustering, minimum sum of squared error is considered as the objective and the researchers have used optimization procedure defined in their algorithm for solving clustering objective (Binu et al., 2013). Based on the similar procedure, Genetic Algorithm (Mualik & Bandyopadhyay, 2002), Particle Swarm Optimization (Premalatha & Natrajan, 2008), bacterial foraging optimization (Wan et al., 2012), simulated annealing (Selim & Alsultan, 1991), artificial bee colony (Zhang et al., 2010; Karaboga & Ozturk, 2011), firefly algorithm (Senthilnath et al., 2011) and cuckoo search (Goel et al., 2011) algorithms were applied for clustering. This paper focuses on the application of cuckoo search based clustering for brain tumor gene expression dataset.