Domain-Based Approaches to Prediction and Analysis of Protein-Protein Interactions

Domain-Based Approaches to Prediction and Analysis of Protein-Protein Interactions

Morihiro Hayashida (Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan) and Tatsuya Akutsu (Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan)
Copyright: © 2014 |Pages: 18
DOI: 10.4018/ijkdb.2014010103
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Protein-protein interactions play various essential roles in cellular systems. Many methods have been developed for inference of protein-protein interactions from protein sequence data. In this paper, the authors focus on methods based on domain-domain interactions, where a domain is defined as a region within a protein that either performs a specific function or constitutes a stable structural unit. In these methods, the probabilities of domain-domain interactions are inferred from known protein-protein interaction data and protein domain data, and then prediction of interactions is performed based on these probabilities and contents of domains of given proteins. This paper overviews several fundamental methods, which include association method, expectation maximization-based method, support vector machine-based method, linear programming-based method, and conditional random field-based method. This paper also reviews a simple evolutionary model of protein domains, which yields a scale-free distribution of protein domains. By combining with a domain-based protein interaction model, a scale-free distribution of protein-protein interaction networks is also derived.
Article Preview

Introduction

Understanding of functions of genes and proteins is important in post-genomic era. Information on protein-protein interactions is useful for understanding protein functions because protein-protein interactions play a key role in many cellular processes. Since the end of the last century, some experimental techniques have been developed for comprehensive analysis of protein-protein interactions, which include two-hybrid systems and proteomics methods. Though these experimental methods revealed many unknown interactions, there were large gaps between results done by different groups (Ito et al., 2001; Uetz et al., 2000). Therefore, computational methods should be developed for inference of protein-protein interactions. For that purpose, various approaches have been proposed. In this paper, we focus on computational and mathematical aspects of domain-based approaches.

A protein consists of one or multiple domains, where a domain is defined as a region within a protein that either performs a specific function or constitutes a stable structural unit. Examples of structural domains are illustrated in Figure 1 though domains are sometimes defined not based on structures but based on sequence/functional similarities. In a word, domains are considered as parts of a protein. Though there is no exact or mathematical definition of protein domains, several hundreds of protein domains are currently known. In order to classify domains, several database systems have been constructed, which include Pfam (Punta et al., 2012), InterPro (Hunter et al., 2012) and ProDom (Bru et al., 2005). Furthermore, most of these databases provide facilities to identify protein domains from a given protein sequence. In Pfam, each domain is represented by an HMM (Hidden-Markov Model) and protein domains contained in a given protein sequence are identified by using these HMMs.

Figure 1.

Example of protein domains. Protein P1 consists of domains D1 and D2, whereas protein P2 consists of domains D3, D4 and D5. In domain-based models, it is assumed that P1 and P2 interact with each other if at least one domain pair interacts.

Utilizing information of domain organizations of proteins, several methods have been proposed for prediction of protein-protein interactions. In these methods, scores or probabilities of domain-domain interactions are first derived from known protein-protein interactions and then these are utilized for calculating the score or probability of protein-protein interaction for given protein sequences. Sprinzak and Margalit (2001) proposed the association method for computing the score of each domain pair. Kim et al. (2002) proposed similar scores and applied the scores to inference of protein-protein interactions. Deng et al. (2002) proposed an EM (Expectation-Maximization) algorithm for estimating the probability of interaction for each domain pair.

We proposed a conditional random field approach using mutual information between domain sequences (Hayashida et al., 2011). In the study, mutual information between positions of amino acid residues in domains is calculated from multiple sequence alignments because it is known that interacting residue pairs tend to have higher mutual information than non-interacting ones (Fraser et al., 2002). In addition, several methods using mutual information have been developed for inference of protein residue-residue interactions (Weight et al., 2009; Kamada et al., 2011; Jones et al., 2012). Markov and conditional random field models have been well studied in fields of natural language processing and bioinformatics (Sutton & McCallum, 2006; Deng et al., 2004). We modeled protein-protein and domain-domain interactions using a conditional random field, and combined it with mutual information between residues in domains.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing