Protein Sequence
Proteins (Vincent, Bernard, & SinanKockara, n.d.) are present in every cell of the organisms. They are involved virtually almost in all cellular activities. They are responsible for the various metabolic activities, nutrition transportation, regulations and etc. They exist as single chain molecule, as a three dimensional structures or even in the bundle or complex forms. The protein plays a vital role in cellular processes. The protein consists of twenty amino acids. They possess different characteristics such as hydrophobic, hydrophilic, polar, non-polar and etc. It is the great challenge to the bioinformatics that to find which combination of proteins are responsible for what kind of activities. The structure and function discovery of proteins in living organisms is vital role in understanding the background of various cellular processes. It is helpful in treating various diseases, in detecting the drugs to peculiar diseases.
The irregularity in the proteins can be found if their actual functionality and their protein complex are known. The biologist will interpret the functionality of the protein complex based on their chemical properties.
Table 1. 1-LETTER | 3-LETTER | DESCRIPTION |
A | Ala | Alanine |
R | Arg | Arginine |
N | Asn | Asparagine |
D | Asp | Aspartic acid |
C | Cys | Cysteine |
Q | Gln | Glutamine |
E | Glu | Glutamic acid |
G | Gly | Glycine |
H | His | Histidine |
I | Ile | Isoleucine |
L | Leu | Leucine |
K | Lys | Lysine |
M | Met | Methionine |
F | Phe | Phenylalanine |
P | Pro | Proline |
S | Ser | Serine |
T | Thr | Threonine |
W | Trp | Tryptophan |
Y | Tyr | Tyrosine |
V | Val | Valine |
B | Asx | Aspartic acid or Asparagine |
Z | Glx | Glutamine or Glutamic acid |
These significant protein complexes of various organisms are discovered using the computational techniques. Protein complexes are an assembly of proteins that build up some cellular machinery; commonly spans a dense Sub-network of proteins in a protein interaction network. The gene expression would not help much in detecting the functionalities of the gene. Protein sequence can be generated from the DNA/mRNA sequence that codes for the protein are shown in the Figure 1. The four basic amino acid combines to form the protein that is shown. The 20 different amino acids that are present in the protein sequence along with their chemical name are listed in the Table 1.
Figure 1. The IUPAC code for PROTEIN from DNA code
As the protein sequences are in the form of long amino acid sequence, the repeated protein groups are not found manually. These protein groups are denoted as motifs. The protein structure is of many types secondary, tertiary, quaternary, 3 fold and so on. The secondary structures are used in this work. The structural similarity of the detected protein complexes is used to extract the homologous complexes. The functionality of proteins is discovered by various methods like sequence-motif based method, homology based methods, and structure based methods. The motifs can be extracted from the clusters that are generated by various computational techniques.