In this chapter, different methods and applications for reverse engineering of gene regulatory networks that have been developed in recent years are discussed and compared. Inferring gene networks from different kinds of experimental data are a challenging task that emerged, especially with the development of high throughput technologies. Various computational methods based on diverse principles were introduced to identify new regulations among genes. Mathematical aspects of the models are highlighted, and applications for reverse engineering are mentioned.
Deciphering the structure of gene regulatory networks by means of computational methods is a challenging task that emerged during last decades. Large scale experiments, not only gene expression measurements from microarrays but also promoter sequence search for transcription factor binding sites and investigations of protein-DNA interactions have spawned various computational algorithms to infer the structure of underlying gene regulatory networks. Identifying gene interactions yields to an understanding of the topology of gene regulatory networks and the functional role of any gene in a particular pathway. Of interest are genes with a certain impact on pathways. These can be therefore putative drug targets. Once a network is obtained, in silico experiments can be performed to test hypotheses and generate predictions on disease states or on the behavior of the system under different conditions (Wierling et al., 2007).
Quantitative gene expression measurements using microarrays were first performed by Schena et al. (1995) for 45 Arabidopsis thaliana genes and shortly after, for thousands of genes or even a whole genome (DeRisi et al., 1996; DeRisi et al. 1997). Since that time various methods for the analysis of such large-scale data have been developed. First of all, clustering algorithms were used to partition genes into subsets of co-regulated genes regarding to their expression profiles (Eisen et al., 1998). It is found that genes belonging to the same cluster have similar biological functions. But this does not imply information about any directed regulatory interactions among these genes. For this purpose, more sophisticated methods were employed to reverse engineer gene networks and regulation causality from such data. Therefore, reverse engineering constitutes an intermediate step from correlative to causative data analysis.
Gardner and Faith (2005) classified reconstruction algorithms into two general strategies: “physical” approaches and “influence” approaches. Algorithms of the first group seek to identify interactions between transcription factors and DNA and reveal protein factors that physically control RNA synthesis. These methods, such as promoter binding analysis, as performed by Lee et al. (2002), uses the genomic sequence information directly. The second strategy, the “influence” approach, aims to identify causal relationships between RNA transcripts by the examination of expression profiles. The regulation of the transcription machinery can be effected on multiple levels. For instance, it can be regulated on the DNA, transcriptional, or translational level. The regulation on the DNA or transcriptional level is mainly due to binding of transcription factors on specific parts of DNA or by chemical or structural modifications of DNA. A regulation on the translational level might be due to microRNAs resulting in the decay of the respective mRNA target (Ruvkun, 2001). Since the quality of current measured protein concentration is not sufficiently high enough for reconstruction purposes, it is assumed in mathematical models that changes in expression as measured by mRNAs concentrations can explain changes of other gene transcripts. However, a recent study by Newman et al. (2006) showed that many changes in the measured protein levels at single-cell resolution are not observable by DNA microarray experiments. In computational models many regulation effects are neglected or included as hidden factors. In recent years, a combination of both approaches were employed by integrating multiple data sources for the construction of priors of networks or parameters (Imoto et al., 2003; Bernard and Hartemink, 2005; Werhli and Husmeier, 2007).
In this chapter different reverse engineering algorithms following an “influence approach” that have been proposed in the last decades are discussed. Moreover, relevant mathematical aspects are briefly described and applications to reveal gene regulatory networks or part of them are highlighted. Another crucial point that is discussed in this chapter is the validation of algorithms. The reverse engineering methods have to cope with noisy, high dimensional, and incomplete data. But the quality and amount of useful measurements for reverse engineering is increasing.
Key Terms in this Chapter
Neural Network: This refers to a graphical structure with artificial neurons as nodes. The node value of each node is determined by the input signals of the connected nodes passing through a nonlinear transfer function.
Posterior Probability Distribution: It is the conditional probability distribution of a random variable after observing an other variable. It is computed from the prior and the likelihood function.
Prior Probability Distribution: It is the probability distribution of a random variable before any data has been observed. It expresses information about a variable obtain beforehand. Often it is called prior.
Markov Assumption: The conditional probability distribution of the current state is independent of all non-parents. It means for a dynamical system that given the present state, all following states are independent of all past states.
Associated Network: Each arc of this network is associated with an similarity measure between values of the nodes. This measure can be Pearson correlation, mutual information, or others. The node values are vectors of real numbers.
Bayesian Network: This refers to a probabilistic graphical network model defined by a set of random variables and a set of conditional probability distributions. These can be multinomial for discrete variables, Gaussian, for continuous variables, or others.
MCMC: Short for Markov Chain Monte Carlo. It is a class of algorithms for sampling from a probability distribution. This distribution is simulated with a Markov Chain whose equilibrium distribution is the desired probability distribution.
Boolean network: This refers to a graphical structure with nodes that can have two discrete states. A state of a node is determined by the state of other connected nodes. The state of the network is determined by the state of each node.
Mutual Information: It is the information of one random variable in another one. In other words, it is the reduction of uncertainty about one variable after observing the other. It is symmetric in respect to the variables.
Likelihood Function: It is the probability of the occurrence of a sample configuration. The conditional probability distribution of a random variable given the parameters of the distribution has to be known. L ( q | X = x ) = P ( X = x | q ) is a likelihood function, where X is a random variable, x is the observed value of X , and q is a parameter.
Ordinary Differential Equation: It is an equation that contains exactly one real variable and its derivatives.
Reverse Engineering: In general, it is the reconstruction of a system by analyzing of its structure, functions, and operations. Reverse engineering of gene regulatory networks is the process of revealing the underlying structure of gene regulation from biological measurements, such as gene and protein expression, or others.