Incorporating Correlations among Gene Ontology Terms into Predicting Protein Functions

Incorporating Correlations among Gene Ontology Terms into Predicting Protein Functions

Pingzhao Hu (York University & University of Toronto, Canada), Hui Jiang (York University, Canada) and Andrew Emili (University of Toronto, Canada)
DOI: 10.4018/978-1-60960-625-1.ch008


The authors describe a new strategy that has better prediction performance than previous methods, which gives additional insights about the importance of the dependence between functional terms when inferring protein function.
Chapter Preview


Currently the sequencing of many genomes has brought to light the discovery of thousands of putative open reading frames which are all potentially transcribed and translated into protein products. For many of these proteins, little is known beyond their primary sequences, and for the typical proteome, between one-third and one-half of all proteins remains functionally uncharacterized. For example, despite being the most highly studied model bacterium, a comprehensive community annotation effort indicated that only half (~54%) of the protein-coding gene products of E. coli currently have experimental evidence indicative of a biological role (Riley, 2006). The remaining genes have either only generic (homology-derived) functional attributes (e.g. ‘predicted DNA-binding’) or no discernable physiological role. Some of these functional ‘orphans’ may have eluded characterization because they lack obvious mutant phenotypes, are expressed at low or undetectable levels, or have no obvious homology to annotated proteins. Moreover, since proteins often perform different roles in alternate biological contexts, due to the complexity of biological systems, many functions of these alternate functions may not have yet been discovered. As a result, a major challenge in modern biology is to develop efficient methods for determining protein function at the genomic scale (Eisenberg, 2000; Brun, 2003; Barabasi, 2004; Chen 2006; Hu, 2009a).

Given the slow, laborious and expensive nature of experimenttation, computational procedures to systematically predict the functions of uncharacterized proteins from their molecular relationships are increasingly seen to be useful (Vazquez, 2003; Zhou, 2005; Zhao, 2007 and 2008; Hu, 2009a). The most handy and well-known computational method for function prediction is based on the detection of significant sequence similarity to gene products of known function, using such basic bioinformatic software tools as BLAST (Basic Local Alignment Search Tool) (Altschul, 1997). The assumption is that proteins that are similar in sequence likely have similar biological properties. A major caveat with this simplistic approach is that only those functions are obviously and directly tied to sequence, such as enzymatic activity, can be predicted accurately.

Complete Chapter List

Search this Book: