Bicluster Analysis for Coherent Pattern Discovery

Alan Wee-Chung Liew (School of Information and Communication Technology, Griffith University, Australia), Xiangchao Gan (Max Planck Institute for Plant Breeding Research, Germany), Ngai Fong Law (Hong Kong Polytechnic University, Hong Kong) and Hong Yan (City University of Hong Kong, Hong Kong)
DOI: 10.4018/978-1-4666-5888-2.ch159

Top

Background

The goal of biclustering is to find sub-matrices in the dataset, i.e. subsets of objects and subsets of attributes, where the subset of objects exhibits significant homogeneity within the subset of attributes. Figure 1 shows the fundamental difference between clustering and biclustering. Unlike clusters in row-wise or column-wise clustering, biclusters can overlap. In principle, the subsets of attributes for various biclusters can be different. Two biclusters can share some common objects and attributes, and some objects may not belong to any bicluster at all. Due to this flexibility, biclustering has attracted intense interests in the scientific community as a data exploration tool in many fields, ranging from bioinformatics to text mining and marketing.

Figure 1.

Conceptual difference between cluster analysis (left) and bicluster analysis (right). Different shade of grey denotes different clusters/biclusters, except for the right where it can denote overlapping region of two biclusters.

Top

Model Of Bicluster Patterns

Let a dataset of M objects and N attributes be represented by a rectangular matrix D of M rows and N columns. A bicluster is a subset of rows that exhibit similar behaviors across a subset of columns and vice versa. The bicluster B=(X, Y), therefore, appears as a sub-matrix of D, where the set of row indices X and column indices Y are subsets of M and N, respectively. Biclustering aims to discover a set of biclusters Bk = (Xk, Yk) such that each bicluster satisfies some notion of homogeneity.

Key Terms in this Chapter

Geometric Biclustering: A particular class of biclustering algorithms that treat linear bicluster patterns as hyper-planes in a high dimensional data space. Geometric biclustering detects such hyper-planes formed by aggregation of a subset of data points showing coherency in a subset of feature dimensions.

Data Mining: Data mining is the process that attempts to discover previously unknown structures in a large data set such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining).

Cluster Analysis: Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar to each other than to those in other clusters.

Biclustering: An unsupervised data mining technique which allows simultaneous clustering of the rows and columns of a data matrix. Biclustering considers only a subset of relevant features when grouping objects into clusters.

Linear Model of Bicluster Patterns: A class of bicluster patterns where the rows and/or columns of data within the bicluster are related linearly by addition and multiplication.

Complete Chapter List

Search this Book:
Reset