Local Optima Avoidance in GA Biclustering using Map Reduce

Local Optima Avoidance in GA Biclustering using Map Reduce

Gowri R. (Periyar University, Salem, India) and Rathipriya R. (Department of Computer Science, Periyar University, Salem, India)
Copyright: © 2016 |Pages: 11
DOI: 10.4018/IJKDB.2016010104

Abstract

One of the prominent issues in Genetic Algorithm (GA) is premature convergence on local optima. This restricts the enhanced optimal solution searching in the entire search space. Population size is one of the influencing factors in Genetic Algorithm. Increasing the population size will improvise the randomized searching and maintains the diversity in the population. It also increases its computational complexity. Especially in GA Biclustering (GABiC), the search should be randomized to find more optimal patterns. In this paper, a novel approach for population setup in MapReduce framework is proposed. The maximal population is split into population sets, and these groups will proceed searching in parallel using MapReduce framework. This approach is attempted for biclustering the gene expression dataset in this paper. The performance of this proposed work seems promising on comparing its results with those obtained from previous hybridized optimization approaches. This approach will also handle data scalability issues and applicable to the big data biclustering problems.
Article Preview

Introduction

Nowadays, the data availability is beyond the capacity of existing infrastructure, computational capacity. Extracting optimal knowledge through biclustering of vast data is the upcoming research issue. Applying optimization algorithms to mine optimal patterns from these big data will be a great challenge. Higher degree of premature convergence will be seen when we mine optimal patterns from these big data. To overcome this challenge, various efficient big data programming paradigms are available in the literature. In which, MapReduce is a programming model used to compute large dataset and cope with scalability problems. Biclustering is the most popular data mining technique to uncover local patterns (called bicluster) for two dimensional data. Genetic Algorithm (Holland et al., 1975) is one of the efficient optimization techniques used to mine optimal local patterns, so GA is used for biclustering the big data in MapReduce framework. In this paper, the proposed approach is applied to the gene expression data of yeast (saccharomyces cerevisiae). It performs simultaneously the selection of rows (genes) and columns (conditions) of a data matrix D (G, C) leading to the discovery of biclusters B(G', C') where G' ⊆ G and C' ⊆ C. A bicluster is essentially a sub-matrix revealing certain homogeneity of specific genes at specific conditions. It has been studied recently and several reviews can be found in Madeira et al. (2004), Tanay et al. (2005), Busygin et al. (2008), and Fan et al. (2010).

Biclustering is the ideal technique for pattern discovery in the web usage mining and gene expression analysis because:

  • Only a small set of the genes express on the subset set of conditions in the entire gene expression matrix.

  • Some peculiar genes react only on minimal number of conditions, which may be ignored in clustering approaches.

The Genetic algorithm (GA) is an adaptive heuristic search method based on population genetics. Genetic algorithm was introduced by John Holland in the early 1970s (Holland et al., 1975). GA is a probabilistic search algorithm based on the mechanics of Darwin’s natural selection and natural genetics for solving the optimization problem. There are stochastic operators like selection, crossover and mutation are used in GA to implement the natural selection operation. Initially random population is generated, then stochastic operators are applied to generate the whole generation of new strings. But in the proposed approach Genetic Algorithm using Map-Reduce, GA population are treated as map inputs, the target data set is common to all the maps. A versatile population set with random individual performs optimal solution search in parallel.

This paper discusses about an Evolutionary Biclustering Approach (Rathipriya et al., 2011) using MapReduce and its application in gene expression data. The proposed approach identifies global optimal bicluster from the gene expression data. This optimal biclusters preserve locally defined degree of homogeneity between genes and conditions in gene expression data of a species.

The beneficial outcome of the proposed method is the high correlation between the genes and conditions of a species or interspecies is identified. It also outperforms the existing traditional evolutionary biclustering of gene expression data. It does not stick to the local optima anymore.

The rest of the paper is organized as follows. Section 2 describes some of the biclustering approaches using GA and other local optima avoidance methods in the literature. Methods and materials required for evolutionary biclustering approach and MapReduce framework are described in the section 3. Section 4 focuses on the proposed Evolutionary Biclustering using MapReduce and analysis of experimental results is discussed in the Section 5 concludes the research work with possible future enhancements.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 9: 2 Issues (2019): Forthcoming, Available for Pre-Order
Volume 8: 2 Issues (2018): 1 Released, 1 Forthcoming
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing