Inferring Genetic Regulatory Interactions with Bayesian Logic-Based Model

Inferring Genetic Regulatory Interactions with Bayesian Logic-Based Model

Svetlana Bulashevska (German Cancer Research Centre (DKFZ), Germany)
DOI: 10.4018/978-1-60566-685-3.ch005
OnDemand PDF Download:
List Price: $37.50


This chapter describes the model of genetic regulatory interactions. The model has a Boolean logic semantics representing the cooperative influence of regulators (activators and inhibitors) on the expression of a gene. The model is a probabilistic one, hence allowing for the statistical learning to infer the genetic interactions from microarray gene expression data. Bayesian approach to model inference is employed enabling flexible definitions of a priori probability distributions of the model parameters. Markov Chain Monte Carlo (MCMC) simulation technique Gibbs sampling is used to facilitate Bayesian inference. The problem of identifying actual regulators of a gene from a high number of potential regulators is considered as a Bayesian variable selection task. Strategies for the definition of parameters reducing the parameter space and efficient MCMC sampling methods are the matter of the current research.
Chapter Preview


The advent of microarray technology facilitated monitoring of gene expression and posed the problem of reconstructing genetic regulatory relations from data. A concept of gene regulatory network evolved, as a graphical representation of interactions between genes. This is a simplification of the underlying molecular biological regulatory mechanism, since the expression levels of some genes affect the expression of other genes indirectly, via the synthesis of proteins, protein complex formation, DNA binding etc. Mathematical models of genetic regulatory networks define features of the regulation by means of mathematical functions and propose algorithms in order to infer network models (i.e. connectivity, parameters etc.) from experimental data.

The attempt to model genetic regulation was pioneered long before the appearance of high-throughput molecular genetics methods (Kauffman 1969, 1996). It was stated that the regulatory interactions between genes can be presented as logic gates as exemplified in Figure 1, and the Boolean network model was proposed. In the Boolean network, discrete states of genes (the active and the not active) are admitted, and the state of each gene is functionally determined by the states of some other genes using the rules of logics. Continuous gene expression measurements must be discretized before they can be used for Boolean network modeling.

Figure 1.

Examples of genetic regulatory functions presented as logic gates

The fundamental idea behind the Boolean network is that the gene regulation is executed by transcription factors transcribed from a number of genes, which cooperatively bind to the binding sites of a target gene. This constitutes a so called cis-regulatory element, the working principles of which can be described by means of logics. Some genes are activated by one of several different possible transcription factors (“OR” logic). Other genes require that two or more transcription factors must all be bound for the activation (“AND” logic). The activation of some genes may be inhibited by one of a few possible repressor proteins (“NOT OR” logic, in our notation “NOR”). Further on, in case of “OR-NOR” logic, a gene is regulated by a set of possible activators and a set of possible inhibitors. The gene is transcribed if and only if one of its possible activators is active and it is not repressed by one of its possible repressors. An algorithm REVEAL was developed to reverse-engineer Boolean logic relations from expression data, based on mutual information between input and output states (Somogyi and Sniegosky, 1996; Liang et al., 1998). The major limitation of the Boolean network model was its inherent determinism, which contradicts with the stochastic nature of the underlying process of gene regulation and limits the reliability of relations inferred from real data.

Later on, extensions of Boolean Networks were suggested to make them robust against noise. In the noisy Boolean networks of Akutsu (2000), a certain probability is defined, with which a number of input/output patterns will not be discarded by an inference algorithm, even if a Boolean function is not satisfied. In the Probabilistic Boolean Networks (Shmulevich et al. 2002), more than one Boolean function are defined for each gene, and the particular function for calculating the state of the gene is selected with a certain probability.

Key Terms in this Chapter

Noisy-OR model: is a special case of the specification of the CPD in the Bayesian network, where the number of parameters is linear on the number of parents of a node. The idea is that each parent is capable to execute its influence on the node independently of other parents, whereby the individual effects are then summarized with the Boolean function OR.

Probabilistic modeling: a kind of modelling where a problem space is expressed in terms of random variables and their probability distributions. Properties of the underlying distributions are being deduced from data in the process of probabilistic inference.

Boolean network: a set of Boolean variables connected in the network, where the state of each variable is determined by the states of its neighbours by Boolean functions.

Bayesian Inference: is a statistical inference method in which the degree of belief in a hypothesis is expressed in terms of probability distributions a priori i.e. before evidence has been observed, and is updated using evidence with the help of the Bayes’ theorem.

Bayesian variable selection: a problem of identifying a subset of predictors from a large set of potential predictors in the regression-like models. Bayesian approach is promising due to efficient a priori parameter formulations.

Bayesian Network: is a probabilistic graphical model representing conditional independencies of random variables via a directed acyclic graph (DAG). A Bayesian network is specified by a graph structure and conditional probability distributions (CPDs) for each node, conditional upon its parents in the graph. Algorithms exist that perform inference and learning in Bayesian networks.

Graphical models: graphs with nodes representing random variables, where arcs encode conditional independencies between the variables.

Genetic regulatory network: an abstract representation of the orchestrated regulation of expression of genes.

Gibbs sampling: is a special case of the MCMC sampling algorithms named after the physicist J. W. Gibbs. The algorithm samples from the joint probability distribution of random variables by generating an instance from the distribution of each variable in turn, conditional on the current values of the other variables.

Markov Chain Monte Carlo (MCMC): is a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution.

Complete Chapter List

Search this Book: