Efficient Mining Frequent Closed Discriminative Biclusters by Sample-Growth: The FDCluster Approach

Efficient Mining Frequent Closed Discriminative Biclusters by Sample-Growth: The FDCluster Approach

Miao Wang (Northwestern Polytechnical University, China), Xuequn Shang (Northwestern Polytechnical University, China), Shaohua Zhang (Northwestern Polytechnical University, China) and Zhanhuai Li (Northwestern Polytechnical University, China)
Copyright: © 2012 |Pages: 20
DOI: 10.4018/978-1-4666-1785-8.ch006
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

DNA microarray technology has generated a large number of gene expression data. Biclustering is a methodology allowing for condition set and gene set points clustering simultaneously. It finds clusters of genes possessing similar characteristics together with biological conditions creating these similarities. Almost all the current biclustering algorithms find bicluster in one microarray dataset. In order to reduce the noise influence and find more biological biclusters, the authors propose the FDCluster algorithm in order to mine frequent closed discriminative bicluster in multiple microarray datasets. FDCluster uses Apriori property and several novel techniques for pruning to mine biclusters efficiently. To increase the space usage, FDCluster also utilizes several techniques to generate frequent closed bicluster without candidate maintenance in memory. The experimental results show that FDCluster is more effective than traditional methods in either single micorarray dataset or multiple microarray datasets. This paper tests the biological significance using GO to show the proposed method is able to produce biologically relevant biclusters.
Chapter Preview
Top

Introduction

Nowadays, in the post-genomic era, there have many bioinformatics data sets available. Due to the lack of accurate machine learning or intelligent tools in the bioinformatics community, the information embedded in most of these data has not yet completely exploited. Recently, DNA microarray technology has generated a large number of gene expression data, which is typically represented by a matrix where each cell represents the gene expression level of a gene under an experimental condition. How to use these data to reveal the function and biological process of genes poses a great challenge of analysis algorithms. Various data mining techniques have been employed to infer useful biological information from the huge and rapid growing microarray data set.

One widely used method to infer relationship among genes in microarray data set is frequent pattern mining. Based on the characteristic of microarray data, (Pan et al., 2004; Cong et al., 2004) proposed to use condition enumeration method to exploit the gene patterns. However, both of above algorithms need to maintain the candidate patterns in memory, which limits the scalability. Association rules mining method is another way to analyze the gene expression data (Becquet et al., 2003; Creighton & Hanash, 2003; McIntosh & Chawla, 2007; Cong et al., 2004), which can discover the relationship among genes. However, it only can identify genes whose expression levels correlated across some conditions, it can not reveal the regulatory relations among genes. Using association rule to exploit regulatory modules has its limitations (Yeung et al., 2004).

How to identify genes with similar behavior with respect to different samples? Biclustering (Cheng & Church, 2000) is a methodology allowing for condition set and gene set points clustering simultaneously. It finds clusters of genes possessing similar characteristics together with biological conditions creating these similarities. The main advantage of biclustering is the simultaneous mining module on genes and experimental condition, another advantage is its applicability on original data instead of discretized data (Zhao & Zaki, 2005). However, mining microarray data for biclusters presents the following four challenges. First, the computing of biclustering method is NP-hard (Cheng & Church, 2000). Second, biclustering method deals with original data, it should adapt to the noise-sensitive character of microarray dataset. Third, the biclustering method should allow overlapping biclusters which share some genes or conditions, which would increase the complex of biclustering algorithm. Finally, the biclustering method should be flexible enough to handle different types of biclusters. (Madeira & Oliverira, 2004) classified biclusters into four categories: (i) constant value biclusters, (ii) constant row or column biclusters, (iii) biclusters with coherent values, where each row and column is obtained by addition or multiplication of the previous row and column by a constant value and (iv) biclusters with coherent evolutions, where the direction of change of values is important rather than the coherence of the values (Pandey et al., 2009).

Complete Chapter List

Search this Book:
Reset