The use of systems biology to study complex biological questions is gaining ground due to the ever-increasing amount of genetic tools and genome sequences available. As such, systems biology concepts and approaches are increasingly underpinning our concept of microbial physiology. Three tools for use in functional genomics are gene expression, proteomics, and metabolomics. However, these tools produce such large data sets that we sometimes become paralyzed trying to merge the data and link it to form a consistent biological interpretation. Use of functional groupings has relieved some of the issues in merging data for biological meaning. Statistical analysis and visualization of these multi-dimension data sets are needed to aid the microbiologist, which brings additional methods that are often not familiar. Progress is being made to bring these diverse data types together to understand fundamental metabolic processes and pathways. These efforts are paying tremendous dividends in our understanding of how microbes live, grow, survive, and metabolize nutrients. These insights allow metabolic engineering to progress and allow scientists to further define the mechanisms of metabolism.
Systems biology brought a great challenge to the microbial world – produce genome-scale data that is integrated into a complete biological picture using specific genes. In spite of this challenge, systems biology is increasingly underpinning our concept of microbial physiology based on genome sequence. Initially, production of a genome sequence limited implementation of this paradigm. However, the rate at which new genome content is accumulating is staggering and is the basis of new avenues for discovery.
Publicly available genome sequence now exceeds 700 finished genomes with an additional 1,700 genomes in process of sequencing that provides access to at least 3,600 individual microbial genomes with an additional 116 metagenome projects underway (GOLD, 2008; www.genomeonline.org). These projects are challenging scientists’ ability to collect, process, and overlay a biologically meaningful interpretation of the data. Application of that information to make biologically informed decisions is also a daunting challenge that requires a fresh perspective and new skill sets that leverage genomic-based tools to answer specific biological questions.
The heart of the systems biology discovery lies in the new fields of comparative and functional genomics, along with proteomic and metabolite profiles (Fields, 2000). Comparison of genomes to assess the link between structural similarity and functional expression is fully enabled with access to genome content. Comparative analyses of bacterial genomes provide new information about the dynamic interchange of DNA between microbes (Hughes, 2000). Comparing sequenced genomes is an excellent approach to explore genome plasticity and how it impacts the metabolism of microbes.
Approaching microbial metabolism from a systems biology perspective is a new position that requires scientists to think of systems of activities that is composed of multiple individuals to carry out those activities. Use of gene ontologies and (GO) and clusters of orthologous groups (COG’s) are very helpful for this aspect of comparison since specific and common terms are used in a hierarchical classification system, and are essential when using functional genomic tools to drill down to individual conversations (i.e. activities). These tools include gene expression profiling, protein expression profiling, metabolomics profiles, and new statistical methods.
Functional genomics must incorporate data from each set of tools to examine how organisms utilize the genome potential (via gene expression initially). Since most genomes have thousands of genes that are monitored the task is like trying to monitor all the individual conversations in a crowded stadium at one moment in time. In essence, functional genomics documents a cell’s many “conversations” as they occur simultaneously. The conversations are defined by genes expression (gene arrays), protein interactions (proteomics), and small molecule biochemistry (metabolomics) that create a multi-dimensional view of the cell. Deciphering these conversations in turn outlines the web of actors in the metabolic networks, providing an unprecedented view of how a living cell carries out the many functions of growth, survival, pathogenicity, and metabolism.
Production of multi-dimensional data using these techniques produce very different types of data that need a common thread that links the data types. Such large data sets lead to an interpretational paralysis that limits the linkage between the functional genomics data and biological relevance or meaning. Therefore, the analytical phase must seek to fully integrate all of these tools using new statistical tools that are also unfamiliar to most microbiologists, which often exacerbates interpretation difficulties when the genotype and the phenotype are at odds. Taken together, it is clear that the use of bioinformatics is essential for mining genome content, but also essential to provide bring new scientific abilities that truly bring biologically meaningful insights via systems biology.
Key Terms in this Chapter
Bioconductor: ( www.bioconductor.org ): An open source and open development software project for the analysis and comprehension of genomic data.
Significance Analysis Of Microarrays: (SAM; www-stat.stanford.edu/~tibs/SAM/index.html): An Excel plug-in that is used to analyze microarray data.
R: ( www.r-project.org ): R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques that is highly extensible.