Extraction of Protein Sequence Motif Information using Bio-Inspired Computing

Extraction of Protein Sequence Motif Information using Bio-Inspired Computing

Gowri Rajasekaran (Periyar University, India) and Rathipriya R (Periyar University, India)
DOI: 10.4018/978-1-5225-0427-6.ch012
OnDemand PDF Download:
No Current Special Offers


Nowadays there are many people affected by the genetic disorder, hereditary diseases, etc. The protein complexes and their functions are detected, in order to find the irregularity in the gene expression. In a group of related proteins, there exist some conserved sequence patterns (motifs) either functionally or structurally similar. The main objective of this work is to find the motif information from the given protein sequence dataset. The functionalities of the proteins are ideally found from their motif information. Clustering approach is a main data mining technique. Besides the clustering approach, the biclustering is also used in many Bioinformatics related research works. The PSO K-Means clustering and biclustering approach is proposed in this work to extract the motif information. The Motif is extracted based on the structure homogeneity of the protein sequence. In this work, the clusters and biclusters are compared based on homogeneity and motif information extracted. This study shows that biclustering approach yields better result than the clustering approach.
Chapter Preview


Protein Sequence

Proteins (Vincent, Bernard, & SinanKockara, n.d.) are present in every cell of the organisms. They are involved virtually almost in all cellular activities. They are responsible for the various metabolic activities, nutrition transportation, regulations and etc. They exist as single chain molecule, as a three dimensional structures or even in the bundle or complex forms. The protein plays a vital role in cellular processes. The protein consists of twenty amino acids. They possess different characteristics such as hydrophobic, hydrophilic, polar, non-polar and etc. It is the great challenge to the bioinformatics that to find which combination of proteins are responsible for what kind of activities. The structure and function discovery of proteins in living organisms is vital role in understanding the background of various cellular processes. It is helpful in treating various diseases, in detecting the drugs to peculiar diseases.

The irregularity in the proteins can be found if their actual functionality and their protein complex are known. The biologist will interpret the functionality of the protein complex based on their chemical properties.

Table 1.
Amino acid codes
DAspAspartic acid
EGluGlutamic acid
BAsxAspartic acid or Asparagine
ZGlxGlutamine or Glutamic acid

These significant protein complexes of various organisms are discovered using the computational techniques. Protein complexes are an assembly of proteins that build up some cellular machinery; commonly spans a dense Sub-network of proteins in a protein interaction network. The gene expression would not help much in detecting the functionalities of the gene. Protein sequence can be generated from the DNA/mRNA sequence that codes for the protein are shown in the Figure 1. The four basic amino acid combines to form the protein that is shown. The 20 different amino acids that are present in the protein sequence along with their chemical name are listed in the Table 1.

Figure 1.

The IUPAC code for PROTEIN from DNA code


As the protein sequences are in the form of long amino acid sequence, the repeated protein groups are not found manually. These protein groups are denoted as motifs. The protein structure is of many types secondary, tertiary, quaternary, 3 fold and so on. The secondary structures are used in this work. The structural similarity of the detected protein complexes is used to extract the homologous complexes. The functionality of proteins is discovered by various methods like sequence-motif based method, homology based methods, and structure based methods. The motifs can be extracted from the clusters that are generated by various computational techniques.

Key Terms in this Chapter

PSO: Particle Swarm Optimization is one of the swarm based optimization technique. It is devised based on the bird flocking behavior

Protein Motif: Motifs are the patterns that present repeatedly in the sequence which are responsible for various cellular processes

Significant Amino Acids (SAA): SAA represents the protein pattern that present in the set of protein sequences that shows the dominant characteristic present in that Protein group.

Clustering: Grouping similar kind of data elements. It is used to discover similar patterns from a sea of data. The similarity between the objects in the same cluster is greater than that of the different clusters.

Biclustering: The process of grouping data based on both the rows (segments) and columns (amino acids)

Bio-Inspired Computing: Bio-inspired computing is a field devoted to tackling complex problems using computational methods modeled after design principles encountered in nature

Protein Sequence: Sequence of Amino Acids responsible for various cellular activities of the biological systems

Complete Chapter List

Search this Book: