Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

A Latent Feature Model Approach to Biclustering

José Caldas, Samuel Kaski

Source Title: International Journal of Knowledge Discovery in Bioinformatics (IJKDB) 6(2)

DOI: 10.4018/IJKDB.2016070102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Biclustering is the unsupervised learning task of mining a data matrix for useful submatrices, for instance groups of genes that are co-expressed under particular biological conditions. As these submatrices are expected to partly overlap, a significant challenge in biclustering is to develop methods that are able to detect overlapping biclusters. The authors propose a probabilistic mixture modelling framework for biclustering biological data that lends itself to various data types and allows biclusters to overlap. Their framework is akin to the latent feature and mixture-of-experts model families, with inference and parameter estimation being performed via a variational expectation-maximization algorithm. The model compares favorably with competing approaches, both in a binary DNA copy number variation data set and in a miRNA expression data set, indicating that it may potentially be used as a general-problem solving tool in biclustering.

Article Preview

Top

Introduction

Clustering methods have been useful at understanding global trends in high-throughput molecular biology data (D’Haeseleer, 2005). Due to the increasing number of phenotypes that are probed for in individual studies, as well as the intrinsic complexity and high dimensionality of genome-wide high-throughput data, it has become increasingly meaningful to detect local, rather than global data trends. In practice, this amounts to grouping subsets of biological conditions and associate with each group the measurements that make those biological conditions similar. For instance, in a gene expression study where the messenger RNA (mRNA) level of each gene is measured in a number of samples corresponding to certain biological conditions (e.g. tumor samples from multiple disease stages), the analyst may be interested in detecting groups of samples that are similar and associate with each group of samples the genes whose measurements are similar across those samples. This unsupervised learning task is known as biclustering (Cheng & Church, 2000; Madeira & Oliveira, 2004).

Formally, biclustering is an unsupervised learning task that takes as input a data matrix D and learns a set of submatrices of D, which are designated as biclusters. Each submatrix/bicluster should contain certain desirable properties; the specific desiderata depend on the specific problem formulation and the data type of the input matrix D. For instance, in a sparse binary matrix, the analyst’s intention may be to detect biclusters that correspond to dense submatrices; alternatively, in a continuous data set with values spanning a given range, the modeller’s intention may be to detect biclusters that correspond to submatrices in which the values are typically close to each other according to a given metric (e.g. Euclidean). Throughout the present paper, we use the terms object and condition to designate the rows and columns of a data matrix. For instance, in a gene expression matrix such as the one described above, the objects are genes and the conditions are the biological samples.

Biclustering methods may be classified according to the type of bicluster structures they can detect (Madeira & Oliveira, 2004). Crucially, some but not all methods allow biclusters to overlap, that is, they allow a pair of object and condition to belong to more than one bicluster. Allowing membership to multiple biclusters is justified, accounting for instance to the multiple functional roles that a gene may undertake or the various biological processes that are simultaneously active in a biological condition. However, it brings in additional technical challenges regarding how to properly specify a model that handles bicluster overlap. A typical approach in the context of expression data is to specify a linear model wherein each bicluster corresponds to a given set of parameters; in this model family, bicluster parameters combine additively, i.e., the parameters used for modelling a given data point are obtained by summing the parameters across all biclusters that include the object-condition pair . A well-known member of this model family is the plaid model (Lazzeroni & Owen, 2002). More general frameworks combine parameter additivity with link functions (e.g., the sigmoid function) in order to model discrete data types (Meeds et al., 2007). The main drawbacks of such parameter interaction paradigms are that the specific interaction assumptions are restrictive and often may not hold; the necessity to introduce parameter interaction assumptions, e.g., additive combination of bicluster parameters, in addition to specifying how each bicluster models the data assigned to it, may lead the practitioner to postulate artificial assumptions solely for the purpose of maintaining model soundness. We propose an alternative mixture-modelling approach, leading to more straightforward solutions, in which each object-condition pair may belong to several biclusters, as long as those biclusters provide equally good models for the corresponding data points .