Influence of Genomic and Other Biological Data Sets in the Understanding of Protein Structures, Functions and Interactions

Influence of Genomic and Other Biological Data Sets in the Understanding of Protein Structures, Functions and Interactions

N. Srinivasan (Indian Institute of Science, India), G. Agarwal (Indian Institute of Science, India), R. M. Bhaskara (Indian Institute of Science, India), R. Gadkari (Indian Institute of Science, India), O. Krishnadev (University of Georgia, USA), B. Lakshmi (National Centre for Biological Sciences, India), S. Mahajan (National Centre for Biological Sciences, India), S. Mohanty (Indian Institute of Science, India), R. Mudgal (Indian Institute of Science, India), R. Rakshambikai (Indian Institute of Science, India), S. Sandhya (Indian Institute of Science, India), G. Sudha (Indian Institute of Science, India), L. Swapna (Indian Institute of Science, India) and N. Tyagi (Indian Institute of Science, India)
Copyright: © 2011 |Pages: 21
DOI: 10.4018/jkdb.2011010102
OnDemand PDF Download:


In the post-genomic era, biological databases are growing at a tremendous rate. Despite rapid accumulation of biological information, functions and other biological properties of many putative gene products of various organisms remain either unknown or obscure. This paper examines how strategic integration of large biological databases and combinations of various biological information helps address some of the fundamental questions on protein structure, function and interactions. New developments in function recognition by remote homology detection and strategic use of sequence databases aid recognition of functions of newly discovered proteins. Knowledge of 3-D structures and combined use of sequences and 3-D structures of homologous protein domains expands the ability of remote homology detection enormously. The authors also demonstrate how combined consideration of functions of individual domains of multi-domain proteins helps in recognizing gross biological attributes. This paper also discusses a few cases of combining disparate biological datasets or combination of disparate biological information in obtaining new insights about protein-protein interactions across a host and a pathogen. Finally, the authors discuss how combinations of low resolution structural data, obtained using cryoEM studies, of gigantic multi-component assemblies, and atomic level 3-D structures of the components is effective in inferring finer features in the assembly.
Article Preview


In the last three decades, overwhelming and ever-burgeoning volumes of biological data from diverse high-throughput experiments have changed the face of biological research. The advent of information technology, data management resources and ready access to specialized data repositories from diverse disciplines has changed the way biology operates today. It is easily appreciable, therefore, that the increased volume and complexity of available biological information has spawned the development of new approaches that applied computational algorithms to interpret and derive information from biological data. As it stands today, such data enable appreciation of functional features not only at the level of biological macromolecules, but also at the level of genome/ proteome as well as at the holistic level of the entire cell or organism.

The disbursement of biological data involves three aspects of computational science; firstly, the management of data pertaining to genes or proteins as a collection of information and making this information available in the public domain as a hosted resource/ databank; secondly, a web-browser that translates the information requested from the databank into a web-page containing relevant information; thirdly, software that mediates communication between the database at the back-end and the web-browser at the front end. Such data management practices have been applied extensively and have spawned the availability of specialized databases, each containing a different subset of biological knowledge. These resources may have evolved as a result of specific investigations or as a resource that is invaluable to a segment of the biological community. Indeed, online and constantly updated resources such as the Molecular Biological Database Collection serve to point such high-quality databases that are useful start-points in biological research (Cochrane & Galperin, 2010). Searchable, brief summaries of each of these databases are also available through the electronic version of the Database Issue and Collection in journals such as the Nucleic acids research (Cochrane & Galperin, 2010).

Some of the databases are extremely fundamental and serve as basic platforms of biological resources. Broadly, these may be classified as related to sequence, structure, function, interaction and pathway. For example, genome sequence databanks (Burks, 1991) and Protein Data Bank (PDB) (Berman et al., 2000) are fundamental databanks while innumerable biological databases and resources such as Pfam (Sonnhammer, Eddy, & Durbin, 1997) and SCOP (Murzin, Brenner, Hubbard, & Chothia, 1995) have been derived from them. Data on biological molecules and cellular processes are diverse and therefore, gene and protein sequences, their expression levels, microarray, protein structures, molecular interactions and higher-level cellular functions find representation in multiple specialized databanks. Examples abound in each category and detailed descriptions of each database are beyond the scope of this review. Such varied representations in diverse resources come with their own challenges in differences in terminologies, conceptual differences in data definition and representation, non-uniformity in ways to access the data etc. An important challenge in the area is also integration and constant update of resources across different platforms.

The rate of growth of biological databanks in the last thirty years has been rapid, as seen in sequence and structural databanks such as Genbank, PDB and SCOP (Figure 1). It is clear that the major challenges in this area not only address ways to tackle the data but also involve efforts to integrate and process diverse datasets from disparate resources. Since each researcher’s queries are specialized, it is difficult to address every potential end-user’s requirements. Therefore, biological researchers have become adept in the extensive use of a large number of databases and tools that are available to them and integrate them at diverse levels to address specific questions. In this review we discuss a few case studies taken from the on-going work in our laboratory on the multi-tiered integration of information at different levels: sequence, structure, integrated sequence and structure, integration of information on domains in multi-domain proteins, disparate data and description of 3-D structures at a gross level.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 7: 2 Issues (2017): 1 Released, 1 Forthcoming
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing