Complex diseases such as cancer have multiple origins and are therefore difficult to understand and cure. Highly parallel technologies such as DNA microarrays are now available. These provide a data deluge which needs to be mined for relevant information and integrated to existing knowledge at different scales. Systems Biology is a recent field which intends to overcome these challenges by combining different disciplines and provide an analytical framework. Some of these challenges are discussed in this chapter.
Systems Biology is emerging as a promising answer to the increasing need for analytical approaches in molecular medicine. Its goal includes modeling interactions, understanding the behavior of a system from interplay of its components, inferring models from data, data integration, confronting the prediction of the model to data, proposing most promising experiments. Solutions to these challenges are often interdisciplinary, and Systems Biology intends to provide such a framework beyond scientific communities ‘dialects’ or differences in approaches (Lazebnik, 2004). Cancer is too complex a disease to be solely and completely described by the existing clinical variables (e.g.: age of the patient, size of the tumor, histological grade, etc.) which are currently used in practice. It is therefore necessary to identify new biomarkers which will provide additional information about the cancer type, origin, or aggressiveness for instance. Within a decade, high-throughput assays have revolutionized biology and are now being introduced in the clinic. Among these techniques, we focus on DNA microarrays which can monitor the expression of tens of thousands of genes in parallel and offer a means to individualize treatment. This should contribute to guide clinicians toward tailored therapies which will lead to reduced over treatments and costs by an improved prognosis, the design of targeted drugs, as well as more accurate application of drugs. Since this is quite a recent field where each analysis requires a large number of steps, consensus has not yet been reached. Furthermore, researchers involved come from various backgrounds (e.g.: Statistics, Engineering, Biology …). Applying tools from all these fields result in a wide spectrum of approaches that may be confusing at first. Nevertheless there are some trends in the biomedical research community that we review in this chapter in the context of cancer. The outline of the chapter is meant to follow a practical analysis pipeline and URLs for accessing resources (i.e.: softwares and data) are provided in Table 1.
Key Terms in this Chapter
Complex System: Such systems (e.g.: a cell, an organ, a whole human body) are complex because of the large number of players involved and/or because of their time and context dependent interactions. The nature of these interactions or regulatory motifs (e.g.: positive or negative feed-back loop, feed-forward loop) increase the complexity of even a simple system with only a handfull of variables.
Biomarker: By definition any bio(logical) marker like a gene or a protein. In molecular oncology, due to the complexity of cancer diseases, highly parallel techniques are needed to identify a set of biomarkers rather than a unique biomarker. Sets of biomarkers are believed to be stronger predictors since they would reflect more reliably the multidimensionality of cancer.
Systems Biology: A field which studies complex biological systems at different levels to decipher the interactions of its key components and provide a mathematical model integrating this heterogeneous information.
Cancer: A genetic disease emerging when cells have acquired at least six important factors contributing to pathogenesis. These hallmarks include evading apoptosis, self-sufficiency in growth signals, insensitivity to anti-growth signal, limitless replicative potential, sustained angiogenesis, and tissue invasion and metastasis.
Omics: Due to the recent advent of highly parallel assays, it is now possible to monitor the behavior of not just one or a couple of variables but rather tens of thousands of variables at once. A growing number of disciplines with the ‘-omics’ suffix like genomics, transcriptomics, metabolomics and so on, intend to describe and understand completely a given level.
Microarray: DNA microarray is a technique to monitor the abundance of tens of thousands RNA transcripts at once (by extension, gene expression). Molecular reporters corresponding to complementary sequences of genes of interest are orderly deposited on a glass surface. This is today the most mature of the highly parallel techniques.
Model: A commonly used but very misleading term which heavily depends on the background of the investigator. In the broad sense, a model is used in any attempt to describe and explain a system of interest which can not be directly observed. A set of hypotheses are required to represent a simplification of it (i.e: a model).
Network Inference: The attempt to discover the relationships between the components (or nodes such as genes, proteins, metabolites) of the network. It is a form of the inverse problem where one starts from the observations (gene expression levels from DNA microarrays for instance) and intends to identify the causes that led to such observations. Due to the very large number of potential players, this is a non-trivial problem which requires a massive amount of data.
Normalization: The correction for known systematic biases in the data to allow a fair comparison.