Novel Adaptive Exon Predictor for DNA Analysis Using Singular Value Decomposition

Novel Adaptive Exon Predictor for DNA Analysis Using Singular Value Decomposition

Srinivasareddy Putluri (Department of Electronics and Communication Engineering, Koneru Lakshmaiah Education Foundation, Guntur, India) and Shaik Yasmin Fathima (Department of Electronics and Communication Engineering, Koneru Lakshmaiah Education Foundation, Guntur, India)
DOI: 10.4018/IJMTIE.2017010103
OnDemand PDF Download:
No Current Special Offers


This article describes how a realistic prediction of the exon regions in deoxyribonucleic acid (DNA) is a key task in the field of genomics. Learning of the protein coding regions is a key aspect of disease identification and designing drugs. These sections of DNA are known as exons, that show three base periodicity (TBP) which serves as a base for all exon locating methods. Many techniques have been applied successfully, but development is still needed in this area. We develop a novel adaptive exon predictor (AEP) using singular value decomposition (SVD) which notably reduces computational complexity and provides better performance in terms of accuracy. Finally, the exon locating capability of proposed SVD based AEP is tested using a real DNA sequence with accession AF099922, obtained from the National Center for Biotechnology Information (NCBI) database and compared with the existing LMS methods. It was shown that proposed AEP is more efficient for locating the exon regions in a DNA sequence.
Article Preview

1. Introduction

The extensive area of research in the field of Bioinformatics is locating the exon regions in a genomic sequence. Vital genes form a subset in organisms which are needed for development, survival and fertility (Ning et al., 2014; Min et al., 2014). Hence, the identification of exons has pragmatic importance to spot human diseases (Dickerson, Zhu, Robertson, & Hentges, 2011) and drug target discovery in new pathogens (Inbamalar & Sivakumar, 2013; Cole, 2002). The protein coding regions and non-protein coding regions are present in a genomic sequence. The Subsection of genomics that focuses on locating the protein coding regions in a genomic sequence is known as gene prediction. The study of prime protein region structure helps the secondary and tertiary structure of protein coding regions for detection of all anomalies, cure diseases and design drugs, as soon as the entire structure of protein regions is analyzed. These studies support in knowing the assessment of phylogenic trees (Maji & Garg, 2013; Hamidreza, Shamsi, Hamed, & Sedaaghi, 2013). Based on the elemental structure of molecules, the living organisms are divided into two types termed as prokaryotes and eukaryotes. The sections which code for proteins are continuous and long in prokaryotes; examples of prokaryotes are bacteria and archaea. The genes are a combination of coding sections divided by long non-protein coding sections in eukaryotes. These sections which code for proteins are also called as exons, whereas the non-protein coding sections are termed as introns. All living organisms other than bacteria and archaea come under this category. The coding sections reside in human eukaryotes are only 3% of the sequence and the remaining 97% are non-coding regions. Hence the identification of protein coding sections is a significant task (Maji & Garg, 2013; Wazim, Yuzhen, & Haixu, 2014). Almost in all DNA sequences, a three base periodicity (TBP) is exhibited by the protein coding regions. This is obvious by a sharp peak at a frequency f=1/3 in the power spectral density (PSD) plot (Ghorbani & Hamed, 2015). Several techniques for predicting exon regions are presented in literature based on various signal processing methods (Gangchen & Yihui, 2014; Yusuke, A., & Shuichi, 2014; VenkataSrikanth & Rahman, 2016). But, the length of the sequence in real-time gene sequence is extremely long and also the location of the exons varies from sequence to sequence. Existing signal processing techniques are not so accurate in the prediction of protein coding regions. Adaptive signal processing techniques are found to be favorable techniques to process very long sequences in several iterations and can change weight coefficients in accordance to the statistical behavior of the input sequence (VenkataSrikanth & Rahman, 2016). In this paper, efficient Adaptive Exon Predictors (AEPs) are developed using adaptive algorithms for locating protein coding sections. Least mean squares (LMS) algorithm is the fundamental adaptive technique. This algorithm is popular because of its simplicity in implementation. But this algorithm suffers problems like gradient noise amplification, weight drift and poor convergence. So, we put forward singular value decomposition (SVD) technique to improve the performance of AEP in terms of accuracy in exon prediction. The SVD algorithm overcomes the drawbacks of LMS and improves exon locating ability and faster convergence (Srinivasareddy & Rahman, 2016). This also leads to reduced excess EMSE in the process of exon prediction. To cope up with the accuracy in exon prediction of an AEP in real time applications, SVD based AEPs are developed. Sign based algorithms apply signum function and minimizes multiplication operations (Akhtar, Ambikairajah, & Julien, 2005). In real time applications, the computational complexity of an adaptive algorithm plays a key role. Particularly when the sequence length is very large, if the computational complexity of the signal processing technique is large the samples overlap on each other at the input of the exon predictor. These leads to inaccuracy in the prediction and causes inter symbol interference (ISI). Also, the large computational complexity tends to bigger circuit size and large operations. Hence, to cope up with the computational complexity of an AEP in real time applications we combine the adaptive algorithms with sign based algorithms. Sign based algorithms apply signum function and lessen the number of multiplication operations (Haykin, 2014; Rahman, Ahamed, Reddy, 2012). The three signum based simplified algorithms are sign regressor algorithm (SRA), sign algorithm (SA) and sign sign algorithm (SSA) (Srinivasareddy & Rahman, 2016; Diniz, 2014). Therefore, in order to minimize the computational complexity and for faster convergence, we propose SVD based AEP. Based on the proposed SVD technique, we develop various AEPs and the performance is tested using real genomic sequences taken from the National Center for Biotechnology Information (NCBI) database (2017). We consider the power spectral density (PSD) as performance characteristics to evaluate the performance of the various AEPs. These performance measures of proposed AEP are compared with the existing LMS method in terms of exon locating capability. It was shown that proposed AEP is better than an existing LMS method for exon prediction. The theory of the adaptive algorithms, discussion on the performance of SVD based AEP in comparison with LMS based AEP and their results are presented in the following sections.

Complete Article List

Search this Journal:
Open Access Articles
Volume 7: 2 Issues (2018)
Volume 6: 2 Issues (2017)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing