Function and Homology of Proteins Similar in Sequence: Phylogenetic Profiling

Function and Homology of Proteins Similar in Sequence: Phylogenetic Profiling

Thomas Meinel (Max Planck Institute for Molecular Genetics, Germany)
Copyright: © 2009 |Pages: 24
DOI: 10.4018/978-1-60566-076-9.ch008
OnDemand PDF Download:
$37.50

Abstract

The function of proteins is a main subject of research in systems biology. Inference of function is now, more than ever, required by the upcoming of novel protein sequences in consequence of the discovery of new proteomes. The calculation of sequence similarity is an easily feasible way to compute protein comparisons. The comparison of complete proteomes touches one of the earliest topics in bioinformatics; the biologically meaningful organization of proteins in protein families. Several approaches that interpret function or evolutionary aspects of proteins from sequence similarity are reviewed, which in particular reflects the arsenal of techniques introduced until now. Phylogenetic profiling, a method that compares a set of genes or proteins by their presence or absence across a given set of organisms, is also presented in this chapter. Proteins in a functional context, for example, a pathway or a protein complex, are represented by identical or similar phylogenetic profiles. The detection of functional contexts by phylogenetic profiling is also playing a prospective role as an analytic tool in systems biology. Already established tools for phylogenetic profiling as well as particular biological examples based on the SYSTERS protein family data set are presented.
Chapter Preview
Top

Introduction

Protein sequence similarity is a feature that plays a central role in comparative proteomics for the inference of protein function or analysis of protein evolution. To study the complexity of functional cellular units like proteins, basic research can often only be conducted on animal models with the completely available experimental design. These studies are expected to play a significant role in medical research. Results are considered to be comparable with results from clinical diagnostics. It must be confirmed that results are transferable to human proteins if such experiments are not possible. Therefore, the determination of proteins with identical function in animal models and in human is essential.

Evolution of life can be characterized by the development of species as well as by the divergence of protein sequences, and it is notable that also the development of protein function is an evolutionary process. Several more or less independent biological research fields are introduced to elucidate the backgrounds of a particular evolution event - often only under constraints of the temporary availability of appropriate methods. The development of techniques for a rapid and voluminous sequencing of DNA continues to lead to complete gene sets, genomes, for an increasing list of organisms. Computational techniques calculate the translation of genomic information to proteins including processes like alternative splicing or translation start site variation. The generation of all electronically inferred proteomes is based on such specific algorithms. In parallel to the development of those tools, new evolutionary events were detected and investigated. Some of them are evolutionary events like gene duplication, gene fusion or fission, protein domain rearrangements, horizontal gene transfer, multiple copy number of genes.

In its first part, this book chapter emphasizes the reasons for distinguishing between sequence similarity and homology and function of proteins. Sequence similarity is a parameter that can be computed from a simple biophysical trait of a protein, the sequence, i.e., the primary protein structure. However, it is more complex to determine protein homology, even if it is plausible that proteins with common evolutionary history are similar in sequence. The other way around, in the context of bioinformatics, inference of homology is the interpretation of an observation - namely sequence similarity. The goal here is to determine similar proteins in recent organisms as descendants from a gene with common ancestry and thereby as homologs. The matter of inference of protein function can be discussed in the same way: Similar proteins possess with high probability a common ancestry and therefore similar function. But proteins can adopt function or are specialized during their evolutionary history. Proteins of similar function not necessarily possess similar sequence, therefore.

Consequently, it is necessary to know existing protein sequence comparison methods and underlying methods for the partitioning of proteins into protein families. In fact, a scientist works with more or less closely related members of protein families when using an expression like ‘two homologous genes’. Methodological backgrounds of established data sets are therefore briefly reviewed by this book chapter.

It is observed that proteins in a common functional context are evolutionary conserved in most of the organisms that own such a functional context. A phylogenetic profile is a pattern of presence or absence of a gene or protein across a given set of organisms. Phylogenetic profiling is a method that compares proteins by their phylogenetic profiles. Because proteins are different in organisms, a grouping of proteins is essential for the generation of a phylogenetic profile. Phylogenetic profiling is therefore depending on the method for partitioning of the protein space into protein families. This book chapter in its second part reviews the backgrounds of established phylogenetic profiling tools, restrictions to subsets of organisms on super-kingdom level, general limitations for the detection of functional contexts, and provides particular biological examples of phylogenetic profiles. Phylogenetic profiling as a method to infer unknown protein contexts or to elucidate contexts of proteins unknown in function becomes prospectively relevant in systems biology, and more so with the increasing number of complete eukaryotic proteomes.

Function of proteins is a central issue of this review, as subject of detection for unknown proteins using sequence similarity and as subject of inference of functional contexts in phylogenetic profiling.

Key Terms in this Chapter

Similarity Score: Measure of exchange of each of all twenty amino acids towards each of the remaining nineteen others with organization in a scoring matrix.

Local Sequence Similarity: Similarity of two sequences is often found only on a local sequence level by a sequence comparison algorithm (e.g., BLAST). Identical partial subsequences are found in protein domains, for instance, and induce local sequence similarity.

BLAST: Basic Local Alignment Search Tool. A heuristic algorithm for searching of similar words or sequences in databases.

Phylogenetic Profile: Presence/absence indication for a family of genes or proteins across a given set of organisms. A phylogenetic profile represents a gene or protein family by serving for a taxonomic overview.

Similarity Of Sequences: Two protein sequences can be compared in each amino acids position. Identical residues or similar biophysical behavior of compared amino acids determines sequence similarity. Necessary is an alignment of at least two protein sequences.

Multiple Sequence Alignment: Three ore more sequences are displayed in a picture with comparable characters in a column (for proteins: amino acid residues).

E-Value: Parameter that describes the probability that a protein or nucleotide sequence is not randomly found in a sequence database.

Phylogenetic Profiling: Comparison of two or more phylogenetic profiles. Protein families of functional contexts possess similar phylogenetic profiles.

Pairwise Sequence Alignment: Two sequences are displayed in two rows with comparable characters, amino acid residues for protein sequences, in columns.

Distance Measure: Measure to compare protein sequences by their amino acid composition. Sub-summing validations of two character states (amino acids on an identical position) in a similarity measure leads to the similarity score. The distance is the difference of the relative similarity to 1.

Complete Chapter List

Search this Book:
Reset
Editorial Advisory Board
Table of Contents
Foreword
Ralf Herwig
Preface
Andriani Daskalaki
Acknowledgment
Andriani Daskalaki
Chapter 1
Peter Ghazal
An increasing number of biological experiments and more recently clinical based studies are being conducted using large-scale genomic, proteomic and... Sample PDF
Pathway Biology Approach to Medicine
$37.50
Chapter 2
Peter Wellstead, Sree Sreenath, Kwang-Hyun Cho
In this chapter the authors describe systems and control theory concepts for systems biology and the corresponding implications for medicine. The... Sample PDF
Systems and Control Theory for Medical Systems Biology
$37.50
Chapter 3
S. Nikolov
In this chapter we investigate how the inclusion of time delay alters the dynamic properties of (a) delayed protein cross talk model, (b) time delay... Sample PDF
Mathematical Description of Time Delays in Pathways Cross Talk
$37.50
Chapter 4
Elisabeth Maschke-Dutz
In this chapter basic mathematical methods for the deterministic kinetic modeling of biochemical systems are described. Mathematical analysis... Sample PDF
Deterministic Modeling in Medicine
$37.50
Chapter 5
Andrew Kuznetsov
Biologists have used a reductionist approach to investigate the essence of life. In the last years, scientific disciplines have merged with the aim... Sample PDF
Synthetic Biology as a Proof of Systems Biology
$37.50
Chapter 6
Tuan D. Pham
Computational models have been playing a significant role for the computer-based analysis of biological and biomedical data. Given the recent... Sample PDF
Computational Models for the Analysis of Modern Biological Data
$37.50
Chapter 7
Vanathi Gopalakrishnan
This chapter provides a perspective on 3 important collaborative areas in systems biology research. These areas represent biological problems of... Sample PDF
Computer Aided Knowledge Discovery in Biomedicine
$37.50
Chapter 8
Thomas Meinel
The function of proteins is a main subject of research in systems biology. Inference of function is now, more than ever, required by the upcoming of... Sample PDF
Function and Homology of Proteins Similar in Sequence: Phylogenetic Profiling
$37.50
Chapter 9
Nikolaos G. Sgourakis, Pantelis G. Bagos, Stavros J. Hamodrakas
GPCRs comprise a wide and diverse class of eukaryotic transmembrane proteins with well-established pharmacological significance. As a consequence of... Sample PDF
Computational Methods for the Prediction of GPCRs Coupling Selectivity
$37.50
Chapter 10
Pantelis G. Bagos, Stavros J. Hamodrakas
ß-barrel outer membrane proteins constitute the second and less well-studied class of transmembrane proteins. They are present exclusively in the... Sample PDF
Bacterial ß-Barrel Outer Membrane Proteins: A Common Structural Theme Implicated in a Wide Variety of Functional Roles
$37.50
Chapter 11
L.K. Flack
Clustering methods are used to place items in natural patterns or convenient groups. They can be used to place genes into clusters to have similar... Sample PDF
Clustering Methods for Gene-Expression Data
$37.50
Chapter 12
George Sakellaropoulos, Antonis Daskalakis, George Nikiforidis, Christos Argyropoulos
The presentation and interpretation of microarray-based genome-wide gene expression profiles as complex biological entities are considered to be... Sample PDF
Uncovering Fine Structure in Gene Expression Profile by Maximum Entropy Modeling of cDNA Microarray Images and Kernel Density Methods
$37.50
Chapter 13
Wasco Wruck
This chapter describes the application of the BeadArrayTM technology for gene expression profiling. It introduces the BeadArrayTM technology, shows... Sample PDF
Gene Expression Profiling with the BeadArrayTM Platform
$37.50
Chapter 14
Djork-Arné Clevert, Axel Rasche
Readers shall find a quick introduction with recommendations into the preprocessing of Affymetrix GeneChip® microarrays. In the rapidly growing... Sample PDF
The Affymetrix GeneChip® Microarray Platform
$37.50
Chapter 15
Jacek Majewski
Eukaryotic genes have the ability to produce several distinct products from a single genomic locus. Recent developments in microarray technology... Sample PDF
Alternative Isoform Detection Using Exon Arrays
$37.50
Chapter 16
Prerak Desai
The use of systems biology to study complex biological questions is gaining ground due to the ever-increasing amount of genetic tools and genome... Sample PDF
Gene Expression in Microbial Systems for Growth and Metabolism
$37.50
Chapter 17
Heike Stier
Alternative splicing is an important part of the regular process of gene expression. It controls time and tissue dependent expression of specific... Sample PDF
Alternative Splicing and Disease
$37.50
Chapter 18
Axel Kowald
Aging is a complex biological phenomenon that practically affects all multicellular eukaryotes. It is manifested by an ever increasing mortality... Sample PDF
Mathematical Modeling of the Aging Process
$37.50
Chapter 19
Evgenia Makrantonaki
This chapter introduces an in vitro model as a means of studying human hormonal aging. For this purpose, human sebaceous gland cells were maintained... Sample PDF
The Sebaceous Gland: A Model of Hormonal Aging
$37.50
Chapter 20
R. Seigneuric, N.A.W. van Riel, M.H.W. Starmans, A. van Erk
Complex diseases such as cancer have multiple origins and are therefore difficult to understand and cure. Highly parallel technologies such as DNA... Sample PDF
Systems Biology Applied to Cancer Research
$37.50
Chapter 21
Matej Orešic, Antonio Vidal-Puig
In this chapter the authors report on their experience with the analysis and modeling of data obtained from studies of animal models related to... Sample PDF
Systems Biology Strategies in Studies of Energy Homeostasis In Vivo
$37.50
Chapter 22
Axel Rasche
We acquired new computational and experimental prospects to seek insight and cure for millions of afflicted persons with an ancient malady. Type 2... Sample PDF
Approaching Type 2 Diabetes Mellitus by Systems Biology
$37.50
Chapter 23
Alia Benkahla, Lamia Guizani-Tabbane, Ines Abdeljaoued-Tej, Slimane Ben Miled, Koussay Dellagi
This chapter reports a variety of molecular biology informatics and mathematical methods that model the cell response to pathogens. The authors... Sample PDF
Systems Biology and Infectious Diseases
$37.50
Chapter 24
Daniela Albrecht, Reinhard Guthke
This chapter describes a holistic approach to understand the molecular biology and infection process of human-pathogenic fungi. It comprises the... Sample PDF
Systems Biology of Human-Pathogenic Fungi
$37.50
Chapter 25
Jessica Ahmed
Secretases are aspartic proteases, which specifically trim important, medically relevant targets such as the amyloid-precursor protein (APP) or the... Sample PDF
Development of Specific Gamma Secretase Inhibitors
$37.50
Chapter 26
Paul Wrede
Peptides fulfill many tasks in controlling and regulating cellular functions and are key molecules in systems biology. There is a great demand in... Sample PDF
In Machina Systems for the Rational De Novo Peptide Design
$37.50
Chapter 27
Ferda Mavituna, Raul Munoz-Hernandez, Ana Katerine de Carvalho Lima Lobato
This chapter summarizes the fundamentals of metabolic flux balancing as a computational tool of metabolic engineering and systems biology. It also... Sample PDF
Applications of Metabolic Flux Balancing in Medicine
$37.50
Chapter 28
Roberta Alfieri, Luciano Milanesi
This chapter aims to describe data integration and data mining techniques in the context of systems biology studies. It argues that the different... Sample PDF
Multi-Level Data Integration and Data Mining in Systems Biology
$37.50
Chapter 29
Hendrik Hache
In this chapter, different methods and applications for reverse engineering of gene regulatory networks that have been developed in recent years are... Sample PDF
Methods for Reverse Engineering of Gene Regulatory Networks
$37.50
Chapter 30
Alok Mishra
This chapter introduces the techniques that have been used to identify the genetic regulatory modules by integrating data from various sources. Data... Sample PDF
Data Integration for Regulatory Gene Module Discovery
$37.50
Chapter 31
Elizabeth Santiago-Cortés
Biological systems are composed of multiple interacting elements; in particular, genetic regulatory networks are formed by genes and their... Sample PDF
Discrete Networks as a Suitable Approach for the Analysis of Genetic Regulation
$37.50
Chapter 32
A. Maffezzoli
In this chapter, authors review main methods, approaches, and models for the analysis of neuronal network data. In particular, the analysis concerns... Sample PDF
Investigating the Collective Behavior of Neural Networks: A Review of Signal Processing Approaches
$37.50
Chapter 33
Paolo Vicini
This chapter describes the System for Population Kinetics (SPK), a novel Web service for performing population kinetic analysis. Population kinetic... Sample PDF
The System for Population Kinetics: Open Source Software for Population Analysis
$37.50
Chapter 34
Julia Adolphs
This chapter introduces the theory of optical spectra and excitation energy transfer of light harvesting complexes in photosynthesis. The light... Sample PDF
Photosynthesis: How Proteins Control Excitation Energy Transfer
$37.50
Chapter 35
Michael R. Hamblin
Photodynamic therapy (PDT) is a rapidly advancing treatment for multiple diseases. PDT involves the administration of a nontoxic drug or dye known... Sample PDF
Photodynamic Therapy: A Systems Biology Approach
$37.50
Chapter 36
Andriani Daskalaki
Photodynamic Therapy (PDT) involves administration of a photosensitizer (PS) either systemically or locally, followed by illumination of the lesion... Sample PDF
Modeling of Porphyrin Metabolism with PyBioS
$37.50
Chapter 37
Alexey R. Brazhe, Nadezda A. Brazhe, Alexey N. Pavlov, Georgy V. Maksimov
This chapter describes the application of interference microscopy and double-wavelet analysis to noninvasive study of cell structure and function.... Sample PDF
Interference Microscopy for Cellular Studies
$37.50
Chapter 38
Cathrin Dressler, Olaf Minet, Urszula Zabarylo, Jürgen Beuthan
This chapter deals with the mitochondrias’ stress response to heat, which is the central agent of thermotherapy. Thermotherapies function by... Sample PDF
Fluorescence Imaging of Mitochondrial Long-Term Depolarization in Cancer Cells Exposed to Heat-Stress
$37.50
Chapter 39
Athina Theodosiou, Charalampos Moschopoulos, Marc Baumann, Sophia Kossida
In previous years, scientists have begun understanding the significance of proteins and protein interactions. The direct connection of those with... Sample PDF
Protein Interactions and Diseases
$37.50
Chapter 40
Bernard de Bono
From a genetic perspective, disease can be interpreted in terms of a variation in molecular sequence or expression (dose) that impairs normal... Sample PDF
The Breadth and Depth of BioMedical Molecular Networks: The Reactome Perspective
$37.50
Chapter 41
Jorge Numata
Thermodynamics is one of the best established notions in science. Some recent work in biomolecular modeling has sacrificed its rigor in favor of... Sample PDF
Entropy and Thermodynamics in Biomolecular Simulation
$37.50
Chapter 42
Isabel Reinecke, Peter Deuflhard
In this chapter some model development concepts can be used for the mathematical modeling in physiology as well as a graph theoretical approach for... Sample PDF
Model Development and Decomposition in Physiology
$37.50
Chapter 43
Mohamed Derouich
Throughout the world, seasonal outbreaks of influenza affect millions of people, killing about 500,000 individuals every year. Human influenza... Sample PDF
A Pandemic Avian Influenza Mathematical Model
$37.50
Chapter 44
Mohamed Derouich
Dengue fever is a re-emergent disease affecting more than 100 countries. Its incidence rate has increased fourfold since 1970 with nearly half the... Sample PDF
Dengue Fever: A Mathematical Model with Immunization Program
$37.50
Chapter 45
Ross Foley
The field of histopathology has encountered a key transition point with the progressive move towards use of digital slides and automated image... Sample PDF
Automated Image Analysis Approaches in Histopathology
$37.50
About the Contributors