A Biologically-Inspired Computational Solution for Protein Coding Regions Identification in Noisy DNA Sequences

A Biologically-Inspired Computational Solution for Protein Coding Regions Identification in Noisy DNA Sequences

Muneer Ahmad
DOI: 10.4018/978-1-4666-9792-8.ch010
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Biologically inspired computational solutions for protein coding regions identification are termed as optimized solutions that could enhance regions of interest in noisy DNA signals contrary to contemporary identification. Exponentially growing genomic data needs better protein translation. The solutions proposed so far rely on statistical, digital signal processing and Fourier transforms approaches lacking the reflection for optimal biologically inspired identification of coding regions. This paper presents a peculiar biologically inspired solution for coding regions identification based on wavelet transforms with notion of a peculiar indicator sequence. DNA signal noise has been reduced considerably and exon peaks can be discriminated from introns significantly. A comparative analysis performed over datasets commonly used for protein coding identification revealed the outperformance of proposed solution in power spectral density estimation graphs and numerical discrimination measure's calculations. The significant results achieved depict 75% reduction in computational complexity than Binary indicator sequence method and 32% to 266% improvement than other methods in literature (as a comparison with standard NCBI range). The significance in results has been achieved by efficiently denosing the target DNA signal employing wavelets and peculiar indicator sequence.
Chapter Preview
Top

Introduction

Biologically inspired computational solutions have been adopted in variety of computational problems. Contrary to contemporary solutions, biologically inspired solutions provide better enhanced identification while applied to protein regions identification in noisy sequences. It is well known that DNA sequence contains genes and gene comprises genic and intergenic regions. RNA translation from DNA is an important and critical task because exact identification of protein helps in knowing information regarding protein structure and cell functions. Exons are the regions in gene that are translated to protein and exons boundaries are diffused in intron-exon noise (Mahmood Akhtar et al. 2007). Optimal identification of exons from 1/f noise needs careful attention and adoption of suitable methodology.

Protein is composed of small scale units called amino acids. There are 20 types of amino acids and the sequence of these units determines the type and function of individual protein molecule.

There are 64 possible codon (tri-nucleotide structure of bases) values (Ahmad and Mathkour, 2009) that transcribe the DNA chains to protein chains at regions known as exon in several clusters of non genic regions introns. Exons are the regions responsible for carrying nucleotide bases for protein translation. A codon “ATG” identifies the start of the sequence that contains the protein coding regions and codons “TAA”, “TGA” and “TAG” are stop codons of this sequence where T is normally replaced with U (called Uracil). It is worth mentioning that mere start codon may not help in protein sequence identification, perhaps some other factors are also required in certain species. Learning the exact location of coding regions leads to the provision of optimal solution of the underlying problem.

Table 1 describes the possible combinations of codons for transcription of protein. These combinations help in exonic distinguishing from introns.

Table 1.
Tri-nucleotide composition for codon
TTT
TTC
TTA
TTG
TCT TCC TCA TCGTAT TAC TAA TAGTGT TGC TGA TGG
CTT CTC CTA CTGCCT CCC CCA CCGCAT CAC CAA CAGCGT CGC CGA CGG
ATT ATC ATA ATGACT ACC ACA ACGAAT AAC AAA AAGAGT AGC AGA AGG
GTT GTC GTA GTGGCT GCC GCA GCGGAT GAC GAA GAGGGT GGC GGA GGG

Complete Chapter List

Search this Book:
Reset