Search for Protein Sequence Homologues that Display Considerable Domain Length Variations

Search for Protein Sequence Homologues that Display Considerable Domain Length Variations

Eshita Mutt (National Centre for Biological Sciences (TIFR) and International Institute of Information Technology Hyderabad, India), Abhijit Mitra (International Institute of Information Technology Hyderabad, India) and R. Sowdhamini (National Centre for Biological Sciences (TIFR), India)
Copyright: © 2011 |Pages: 23
DOI: 10.4018/jkdb.2011040104
OnDemand PDF Download:
No Current Special Offers


Independent folding units which have the capability of carrying out biological functions have been classified as “protein domains”. These minimal structural units lead not only to considerable sequence changes of protein domains of similar folds and functions, but also gives rise to remarkable length variations under evolutionary pressure. Rapid and heuristic sequence search algorithms are generally sensitive and effective in recognizing protein domains that are distantly related within large sequence databases, but are not well-suited to identify remote homologues of varying lengths. An even more challenging aspect is introduced to distinguish reliable hits from a vast number of putative false positives that could have suboptimal sequence similarities. Here, the authors present a data-mining approach that provides stage-specific filters in sequence searches to reliably accumulate remote homologues, which encourages sampling of length variations albeit with a low false positive rate. Realization of such remote homologues with vivid length variations could contribute to better understanding of functional variety within protein domain superfamilies.
Article Preview

Materials And Methods

Dataset Used

Dataset used for initial seed sequence in the search for homologues consisted of structure-based sequence alignments of 1776 PASS2 superfamilies [<40% sequence identity between the members] (Bhaduri, Pugalenthi, & Sowdhamini, 2004; Kanagarajadurai et al., 2011) from which single-membered superfamilies and two-membered superfamilies were removed. Resultant ~6000 proteins (spread across 635 superfamilies) of different superfamilies and groups were scaled up to work automatically for the protocol standardized (Figure 1).

Figure 1.

Flowchart of full pipeline for collection of length-variant homologues


Complete Article List

Search this Journal:
Open Access Articles
Volume 8: 2 Issues (2018)
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing