Structural Learning of Genetic Regulatory Networks Based on Prior Biological Knowledge and Microarray Gene Expression Measurements

Structural Learning of Genetic Regulatory Networks Based on Prior Biological Knowledge and Microarray Gene Expression Measurements

Yang Dai (University of Illinois at Chicago, USA), Eyad Almasri (University of Illinois at Chicago, USA), Peter Larsen (University of Illinois at Chicago, USA) and Guanrao Chen (University of Illinois at Chicago, USA)
DOI: 10.4018/978-1-60566-685-3.ch012


The reconstruction of genetic regulatory networks from microarray gene expression measurements has been a challenging problem in bioinformatics. Various methods have been proposed for this problem including the Bayesian Network (BN) approach. In this chapter, we provide a comprehensive survey of the current development of using structure priors derived from high-throughput experimental results such as protein-protein interactions, transcription factor binding location data, evolutionary relationships, and literature database in learning regulatory networks.
Chapter Preview


The Bayesian Network (BN) has been proven to be a useful and important tool in biomedical applications such as clinical decision support systems (Beinlich, Suermondt, Chavez & Cooper, 1989), information retrieval (Baeza-Yates & Ribeiro, 1999), and discovery of gene regulatory networks (Friedman, Linial, Nachman & Pe'er, 2000). Automatic learning of BNs from observational data has been an area of intense research for more than a decade, yielding practical algorithms and tools (Spirtes, Glymour & Scheines, 1993). The ability of the BN approach to reconstruct genetic networks from microarray gene expression data has been extensively evaluated.

Consider a set of microarray experiments that measures the expression of a set of N genes over M different conditions. We denote the gene expression values by an M × N matrix D = (d1,…,dn). The BN method discovers a directed acyclic graph (DAG) S such that the posterior probability P(S | X = D) is maximized. Here X = (X1,…,XN) denotes a set of random variables representing gene expression for genes i = 1,…,N. Let πi be the set of parents of node i in an acyclic network S. Then, the probability P( X = D | S) can be decomposed into the product of local probabilities of nodes specified by the network structure S:

(1) where denotes the subset of variables corresponding to πi and the corresponding observations. For ease of notation, we will omit the symbol X but use D indicating that X takes an observation D. The nodes in the learned network correspond to genes or their products and the edges correspond to direct probabilistic dependencies, such as causality, mediation, activation, or inhibition between the genes. The posterior probability P(S | D) is proportional to the product of the likelihood P(D | S) and the prior probability P(S) of network structure S based on prior knowledge, i.e.,P(S | D) ∝ P(S)P(D | S) (2)

The main approach to learning BNs from data is based on the strategy of search-and-score, which attempts to identify the most probable network S given the data D. This network has the highest posterior probability. Depending on assumptions, maximizing this probability corresponds to maximizing a score function. There are several ways to define the score. A straightforward definition is the likelihood P(D | S). For discrete data and multinomial distribution, the K2 score (Cooper & Herskovits, 1992) is often used to evaluate the networks generated. For a given network S, this score is defined as the likelihood:, (3) where Nijk is the number of cases in D in which variable Xi has the kth value and the parent of i has the jth instantiation; qi is the number of parents for i, and ri the number of possible values of variable Xi. Thus,

. (4)

When the prior probability of S is considered, the K2 score for a network S can be modified by multiplying P(S) to P(D | S) in (3).

Key Terms in this Chapter

K2 Algorithm: K2 algorithm is a score-based algorithm in Bayesian network. It recovers the underlying graphic structure based on a predetermined order of nodes in a greedy fashion.

Gene Ontology: The Gene Ontology (GO) project has developed three structured controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner.

Structural Prior: Structure prior is a prior distribution over directed acyclic graphs.

Prior Biological Knowledge: Databases that include biological interactions between proteins, proteins and DNAs. These interactions are collected from high throughput experiments or curated from literature.

LOI-score: The likelihood of interaction (LOI) score is a measure of the likelihood that a gene or a gene product with a particular molecular function interacts with another gene or a gene product of a particular molecular function. These scores can be derived from databases of interactions and the Gene Ontology molecular function annotations.

KEGG Pathways: Kyoto Encyclopedia of Genes and Genomes (KEGG) includes a collection of manually drawn pathway maps representing our knowledge on the molecular interaction and reaction networks for metabolism, genetic information processing, environmental information processing, cellular processes, human diseases and drug development.

Complete Chapter List

Search this Book: