Here the authors review the state of the art in the use of protein-protein interactions (ppis) within the context of the interpretation of genomic experiments. They report the available resources and methodologies used to create a curated compilation of ppis introducing a novel approach to filter interactions. Special attention is paid in the complexity of the topology of the networks formed by proteins (nodes) and pairwise interactions (edges). These networks can be studied using graph theory and a brief introduction to the characterization of biological networks and definitions of the more used network parameters is also given. Also a report on the available resources to perform different modes of functional profiling using ppi data is provided along with a discussion on the approaches that have typically been applied into this context. They also introduce a novel methodology for the evaluation of networks and some examples of its application.
The available data for protein-protein interactions (ppis) has increased enormously in the last few years with the emergence of high-throughput techniques that can report thousands of ppis in a short time span. The most used techniques in this field are: yeast two hybrid (y2h), tandem affinity purification (TAP) and high-throughput Mass Spectrometry techniques (MS). Reviews on these and related methodologies can be found in Drewes and Bouwmeester (2003), Cho et al. (2003), Falk et al. (2007) and Berggard et al. (2007).
The reliability of this data is not exempt of controversy. Studies comparing resulting data from several experiments demonstrate that the overlap between them is not as complete as desirable. This can be due to the fact that some methodologies do not reach the saturation point (Bader & Hogue, 2002) or because of the lack of accuracy and coverage on some of them (von Mering et al., 2002). A conventional large-scale experiment can cover only 3-9% of the total interactome, so limited overlap should be expected (Han et al., 2005). False positives are also a problem: in y2h these represent up to 50% of the total data (Ito et al., 2001; Mrowka et al., 2001). Moreover, there is a bias in the functional categories of the ppis each technique detects, e.g. y2h fails in detecting proteins involved in translation (von Mering et al., 2002).
Beyond discussions about accuracy and coverage of this kind of experiments, the relevance of ppis in the cellular machinery has fostered an unprecedented interest in the exploration of the interactome of model organisms such as Saccharomyces cerevisiae (Uetz et al., 2000; Ito et al., 2001), Drosophila melanogaster (Gio et al., 2003; Formstecher et al., 2005), Caenorhabditis elegans (Li et al., 2004) or human (Stelzl et al., 2005, Rual et al., 2005), just to cite a few examples.
Actually, after years of intensive study, there is a high-quality, literature curated set of ppis free from false positives that probably represents the complete yeast interactome (Reguly et al., 2006). In the case of human, the scenario is still far away from this degree of detail. The estimated size of the human interactome is of 650,000 ppis (Stumpf et al., 2008). None of the public databases contain more than 10% of this number of ppis, and a compilation of all the known ppis would only cover about 10% of the interactions.
The interactome is an abstract scaffold that does not provide information about particular conditions, cell developmental stage or cell type in which a particular ppi occurs (if any). To infer a case-specific interactome it is necessary to integrate other types of data that provide information that allows inferring the active ppis at a particular condition. To achieve this, the transcriptome, defined as the set of transcripts that are expressed at a given moment in a particular cell type, can be used. An integrative study of the interactome filtered by the transcriptome will provide valuable information on the active ppis in a given cell state.