Inferring Gene Regulatory Networks from Genetical Genomics Data

Inferring Gene Regulatory Networks from Genetical Genomics Data

Bing Liu (Monsanto Co., USA), Ina Hoeschele (Virginia Polytechnic Institute and State University, USA) and Alberto de la Fuente (CRS4 Bioinformatica, Italy)
DOI: 10.4018/978-1-60566-685-3.ch004
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this chapter, we review the current state of Gene Regulatory Network inference based on ‘Genetical Genomics’ experiments (Brem & Kruglyak, 2005; Brem, Yvert, Clinton & Kruglyak, 2002; Jansen, 2003; Jansen & Nap, 2001; Schadt et al., 2003) as a special case of causal network inference in ‘Systems Genetics’ (Threadgill, 2006). In a Genetical Genomics experiment, a segregating or genetically randomized population is DNA marker genotyped and gene-expression profiled on a genomewide scale. The genotypes are regarded as natural, multifactorial perturbations resulting in different gene-expression ‘phenotypes’, and causal relationships can therefore be established between the measured genotypes and the gene-expression phenotypes. In this chapter, we review different computational approaches to Gene Regulatory Network inference based on the joint analysis of DNA marker and expression data and additionally of DNA sequence information if available. This includes different methods for expression QTL mapping, selection of regulator-target pairs, construction of an encompassing network, which strongly constrains the network search space, and pairwise and multivariate methods for Gene Regulatory Network inference, such as Bayesian Networks and Structural Equation Modeling.
Chapter Preview
Top

Introduction

A fruitful abstraction of biochemical systems is that of ‘networks’ (Barabasi & Oltvai, 2004; Dorogovtsev & Mendes, 2003; Newman, 2003; Pieroni et al., 2008; Watts & Strogatz, 1998). Such networks include Transcription Regulatory Networks (TRNs) (Lee et al., 2002; Luscombe et al., 2004; Shen-Orr, Milo, Mangan & Alon, 2002), Protein Interaction Networks (Pieroni et al., 2008; Schwikowski, Uetz & Fields, 2000), Metabolic Networks (Jeong, Tombor, Albert, Oltvai & Barabasi, 2000; Wagner & Fell, 2001), Gene Regulatory Networks (GRNs) (Brazhnik, de la Fuente & Mendes, 2002; D'Haeseleer, Liang & Somogyi, 2000) (see also A. de la Fuente – this book}, and Phenotype Networks (Nadeau et al., 2003). Inferring, or ‘reverse engineering’, such biological networks is therefore currently an area of research receiving a lot of interest and attention. It advances our knowledge about the integrated biochemical machinery of living cells (systems biology) and our understanding of general features of complex traits (complex trait biology). Constructing phenotype networks provides information about the functionality of complex systems (such as cardiovascular function) at the organismal level, and constructing GRNs furthers our understanding of the molecular basis of complex traits and diseases (Chen et al., 2008; Lum et al., 2006; Schadt et al., 2005). GRNs have other applications (Brazhnik, de la Fuente & Mendes, 2002), including the discovery of direct drug targets (di Bernardo et al., 2005; Gardner, di Bernardo, Lorenz & Collins, 2003). It has been shown that classical concepts from genetics, such as dominance and epistasis, can be readily understood in terms of networks and their properties (Kacser & Burns, 1981; Omholt, Plahte, Oyehaug & Xiang, 2000).

Many different experimental and computational approaches to GRN inference have been proposed. Data from experiments without targeted perturbations, or data from observational studies, only allow for inference of undirected Co-Expression Networks that are based on a measure of association between the expression profiles of pairs of genes (e.g.de la Fuente, Bing, Hoeschele & Mendes, 2004; Ghazalpour et al., 2006; Schäfer & Strimmer, 2005a,, 2005b; Wille & Buhlmann, 2006; Wille et al., 2004; Zhang & Horvath, 2005). In particular, one can construct an Undirected Dependency Graph (UDG), which contains edges only between those genes that interact directly, and which can be estimated based on partial correlations (de la Fuente, Bing, Hoeschele & Mendes, 2004; Shipley, 2002). The construction of a UDG can be a first step in a regulatory network analysis of a Genetical Genomics or Systems Genetics experiment.

Key Terms in this Chapter

Family-Wise-Error Rate: Family-Wise-Error Rate (FWER) (also referred to as the genome-wise error rate in the context of QTL mapping) is the probability of making one or more false discoveries in multiple hypothesis testing. FWER control is more conservative (and less powerful) than FDR control.

Bayesian Network: Bayesian networks are directed probabilistic graphical models that represent conditional independence relationships among variables of interest.

False Discovery Rate: False Discovery Rate (FDR) is the expected false positive rate in multiple hypothesis testing. Among the list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses.

Genetical Genomics: Genetical Genomics, also referred to as ‘the genetics of gene expression’, uses naturally occurring, multi-factorial perturbations in segregating or genetically randomized populations. Genetical Genomics approaches integratively analyze gene expression data and genotype data (measurable DNA sequence polymorphisms) and make use of DNA sequence information when available.

Quantitative Trait Locus: Quantitative trait locus (QTL) is a chromosomal region that causally affects a phenotypic trait under consideration. Statistically, a QTL is a confidence interval for the genomic location of a DNA polymorphism that is causal for the phenotype of interest.

etrait: In Genetical Genomics, the gene expression levels are considered as phenotypic traits. Therefore, we call gene expression levels as ‘expression traits’ or in short ‘etraits’.

Structural Equation Modeling: Structural Equation Modeling is a linear statistical modeling framework for testing and estimating causal relationships among variables. It has been widely used in econometrics, sociology and psychology, usually as a confirmatory procedure instead of an exploratory analysis for causal inference.

eQTL: In Genetical Genomics, the gene expression levels are considered as phenotypic traits. Therefore, the identified QTLs are referred to as ‘expression-QTLs’ or ‘eQTLs’.

Complete Chapter List

Search this Book:
Reset