Extraction of Protein Sequence Motif Information using Bio-Inspired Computing

Extraction of Protein Sequence Motif Information using Bio-Inspired Computing

Gowri Rajasekaran, Rathipriya R
DOI: 10.4018/978-1-7998-1204-3.ch065
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Nowadays there are many people affected by the genetic disorder, hereditary diseases, etc. The protein complexes and their functions are detected, in order to find the irregularity in the gene expression. In a group of related proteins, there exist some conserved sequence patterns (motifs) either functionally or structurally similar. The main objective of this work is to find the motif information from the given protein sequence dataset. The functionalities of the proteins are ideally found from their motif information. Clustering approach is a main data mining technique. Besides the clustering approach, the biclustering is also used in many Bioinformatics related research works. The PSO K-Means clustering and biclustering approach is proposed in this work to extract the motif information. The Motif is extracted based on the structure homogeneity of the protein sequence. In this work, the clusters and biclusters are compared based on homogeneity and motif information extracted. This study shows that biclustering approach yields better result than the clustering approach.
Chapter Preview
Top

Introduction

Protein Sequence

Proteins (Vincent, Bernard, & SinanKockara, n.d.) are present in every cell of the organisms. They are involved virtually almost in all cellular activities. They are responsible for the various metabolic activities, nutrition transportation, regulations and etc. They exist as single chain molecule, as a three dimensional structures or even in the bundle or complex forms. The protein plays a vital role in cellular processes. The protein consists of twenty amino acids. They possess different characteristics such as hydrophobic, hydrophilic, polar, non-polar and etc. It is the great challenge to the bioinformatics that to find which combination of proteins are responsible for what kind of activities. The structure and function discovery of proteins in living organisms is vital role in understanding the background of various cellular processes. It is helpful in treating various diseases, in detecting the drugs to peculiar diseases.

Figure 1.

The IUPAC code for PROTEIN from DNA code

978-1-7998-1204-3.ch065.f01

The irregularity in the proteins can be found if their actual functionality and their protein complex are known. The biologist will interpret the functionality of the protein complex based on their chemical properties.

Table 1.
Amino acid codes
1-LETTER3-LETTERDESCRIPTION
AAlaAlanine
RArgArginine
NAsnAsparagine
DAspAspartic acid
CCysCysteine
QGlnGlutamine
EGluGlutamic acid
GGlyGlycine
HHisHistidine
IIleIsoleucine
LLeuLeucine
KLysLysine
MMetMethionine
FPhePhenylalanine
PProProline
SSerSerine
TThrThreonine
WTrpTryptophan
YTyrTyrosine
VValValine
BAsxAspartic acid or Asparagine
ZGlxGlutamine or Glutamic acid

These significant protein complexes of various organisms are discovered using the computational techniques. Protein complexes are an assembly of proteins that build up some cellular machinery; commonly spans a dense Sub-network of proteins in a protein interaction network. The gene expression would not help much in detecting the functionalities of the gene. Protein sequence can be generated from the DNA/mRNA sequence that codes for the protein are shown in the Figure 1. The four basic amino acid combines to form the protein that is shown. The 20 different amino acids that are present in the protein sequence along with their chemical name are listed in the Table 1.

As the protein sequences are in the form of long amino acid sequence, the repeated protein groups are not found manually. These protein groups are denoted as motifs. The protein structure is of many types secondary, tertiary, quaternary, 3 fold and so on. The secondary structures are used in this work. The structural similarity of the detected protein complexes is used to extract the homologous complexes. The functionality of proteins is discovered by various methods like sequence-motif based method, homology based methods, and structure based methods. The motifs can be extracted from the clusters that are generated by various computational techniques.

Complete Chapter List

Search this Book:
Reset