Article Preview
TopIntroduction
Understanding functional behaviors of molecular components is an underlying base for biomedical applications. A wide range of computational approaches have been applied to characterize molecular functions from various types of data sources. In the past, sequence or structure analysis of proteins has contributed to characterize their functions. However, they are incapable of systematically analyzing complex functional mechanisms through biochemical reactions or interactions. Proteins typically execute their functions through interactions with other biomolecular units. Comprehensive knowledge of protein-protein interactions is thus essential to understanding the intrinsic mechanisms of biological processes.
Earlier data of protein-protein interactions were obtained via intensive small-scale investigations of restricted sets of proteins of interest, each yielding data sets regarding a limited number of protein-protein interactions. However, recent high-throughput techniques, such as yeast two-hybrid systems and mass spectrometry, involve genome-wide detection of protein-protein interactions (Uetz et al., 2000; Ito, Chiba, Ozawa, Yoshida, Hattori, & Sakaki, 2001; Gavin et al., 2002; Ho et al., 2002; Giot et al., 2003; Li et al., 2004). The yeast two-hybrid system (Parrish, Gulyas, & Finley, 2006) seeks feasible binary interactions between any two proteins encoded in the genome of interest. The interaction of two proteins transcriptionally activates a reporter gene. This reaction tracks the interaction, revealing “prey” proteins that interact with a known “bait” protein. The mass spectrometry (Aebersold & Mann, 2003) analyzes the composition of a partially purified protein complex. It uses an affinity tag attached to target “bait” proteins for purifying complexes. Comprehensive protein-protein interaction data sets in model organisms, generated by the high-throughput experiments, are publicly available in a number of open databases such as BioGRID (Breitkreutz, Stark, Reguly, Boucher, Breitkreutz, Livstone, Oughtred, Lackner, Bahler, Wood, Dolinski, & Tyers, 2008), MIPS (Mewes, Dietmann, Frishman, Gregory, Mannhaupt, Mayer, Munsterkotter, Ruepp, Spannagl, Stumptflen, & Rattei, 2008), DIP (Salwinski, Miller, Smith, Pettit, Bowie, & Eisenberg, 2004), MINT (Chatr-aryamontri, Ceol, Montecchi-Palazzi, Nardelli, Schneider, Castagnoli, & Cesareni, 2007), IntAct (Aranda et al., 2010), and HPRD (Prasad et al., 2009). However, accurate analysis of protein-protein interactions has been limited due to unreliable interaction data. The large-scale experimental data sets are susceptible to false positives, i.e., some fraction of the putative interactions detected should be considered spurious because they cannot be confirmed to occur in vivo (von Mering, Krause, Snel, Cornell, Oliver, Fields, & Bork, 2002; Sprinzak, Sattath, & Margalit, 2003).