A Novel Evolutionary Biclustering Approach using MapReduce(EBC-MR)

A Novel Evolutionary Biclustering Approach using MapReduce(EBC-MR)

Rathipriya R. (Department of Computer Science, Periyar University, Salem, India)
Copyright: © 2016 |Pages: 11
DOI: 10.4018/IJKDB.2016010103


A novel biclustering approach is proposed in this paper, which can be used to cluster data (like web data, gene expression data) into local pattern using MapReduce framework. The proposed biclustering approach extracts the highly coherent bicluster using a correlation measure called Average Correlation Value measure. Furthermore, MapReduce based genetic algorithm is firstly used to the biclustering of web data. This method can avoid local convergence in the optimization algorithms mostly. The MSWeb dataset and MSNBC web usage data set are used to test the performance of new MapReduce based Evolutionary biclustering algorithm. The experimental study is carried out for comparison of proposed algorithm with traditional genetic algorithm in biclustering. The results reveal that novel proposed approach preforms better than existing evolutionary biclustering approach.
Article Preview

1. Introduction

Nowadays, we are in the world of data deluge. The amount of web data available and collected is beyond the capacity of analyzing and extracting relevant information from it. To address this challenge, various efficient data mining techniques have been proposed in the literature. In which, Hadoop MapReduce is one of the software framework for solving large dataset problems. Biclustering (Madeira et al. 2004) is the most popular data mining technique to uncover local patterns (called bicluster) for two dimensional data. It performs simultaneously the selection of rows (users) and columns (pages) of a data matrix D (U, P) leading to the discovery of biclusters B(U', P') where U is the set of users, P is the set of pages, U' ⊆ U and P' ⊆ P. A bicluster is essentially a sub-matrix revealing certain homogeneity. It has been studied for the one decade and several reviews can be found in Madeira et al. (2004), Tanay et al. (2005), Busygin et al. (2008), Fan et al. (2010), and Pontes et al. (2015).

Biclustering is the ideal technique for pattern discovery in the web usage mining (Rathipriya et al. 2011) because:

  • Only a small set of the users visits the subset set of pages in the entire web site.

  • An interesting users’ browsing pattern is exhibited only in a subset of the pages.

The Genetic algorithm (GA) is an adaptive heuristic search method based on population genetics. Genetic algorithm was introduced by John Holland in the early 1970s (Holland et al., 1975).GA is an evolutionary search algorithm which mimics the mechanics of natural selection and natural genetics in order to optimize a given problem. The problem is formalized by defining a fitness function, which determines a given value for the set of parameters to be optimized in such a way that optimizing the value of this fitness function returns the optimal set of parameters.

Generally, a Sequential GA (SGA) (Guo et al., 2010) proceeds in an iterative manner by generating new populations of strings from the old ones. Every string is the encoded version of a possible solution. An evaluation (fitness) function links a fitness measure to every string indicating its relevance to the problem.

The GA algorithm applies stochastic operators such as selection, crossover and mutation on an initially random population in order to compute a whole generation of new strings. But in Genetic Algorithm using Map-Reduce, GA runs with different population for each map to finds optimal solution faster. Biclustering using GA is also called Evolutionary Biclustering (Chakraborty et al., 2005).

This paper discusses about Evolutionary Biclustering Approach using MapReduce (Rathipriya et al., 2011) (EBA-MR) and its application in web usage data. The proposed approach identifies global optimal bicluster from the preprocessed web usage data. This optimal biclusters preserve locally defined degree of homogeneity between users and pages of web site.

The beneficial outcome of the proposed method is the high correlation between the users and pages of a web site is identified. It also outperforms the existing traditional evolutionary biclustering of web usage data.

The rest of the paper is organized as follows. Section 2 describes some of the biclustering approaches using GA in the literature. Methods and materials required for evolutionary biclustering approach and MapReduce framework are described in the section 3. Section 4 focuses on the proposed Evolutionary Biclustering using MapReduce and analysis of experimental results is discussed in the Section 5 concludes the research work with possible future enhancements

This section lists the some of the popular evolutionary biclustering approach in the past decade.

Bleuler at al. (2004) introduced a general evolutionary framework for biclustering. They use a greedy algorithm as a local search; however, they suggest that it may be replaced with any other local search strategy. Aguilar-Ruiz et al. 2005 introduced biclustering algorithms based on Evolutionary Approach (EA), for finding biclusters on expression data. Chakraborty et al. (2005) used GA for biclustering of expression data.

Complete Article List

Search this Journal:
Open Access Articles
Volume 9: 2 Issues (2019): Forthcoming, Available for Pre-Order
Volume 8: 2 Issues (2018): 1 Released, 1 Forthcoming
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing