Insulin DNA Sequence Classification Using Levy Flight Bat With Back Propagation Algorithm

Insulin DNA Sequence Classification Using Levy Flight Bat With Back Propagation Algorithm

Siyab Khan (The University of Agriculture, Peshawar, Pakistan), Abdullah Khan (The University of Agriculture, Peshawar, Pakistan), Rehan Ullah (The University of Agriculture, Peshawar, Pakistan), Maria Ali (The University of Agriculture, Peshawar, Pakistan) and Rahat Ullah (University of Malakand, Pakistan)
Copyright: © 2020 |Pages: 21
DOI: 10.4018/978-1-7998-2521-0.ch011

Abstract

Various nature-inspired algorithms are used for optimization problems. Recently, one of the nature-inspired algorithms became famous because of its optimality. In order to solve the problem of low accuracy, famous computational methods like machine learning used levy flight Bat algorithm for the problematic classification of an insulin DNA sequence of a healthy human, one variant of the insulin DNA sequence is used. The DNA sequence is collected from NCBI. Preprocessing alignment is performed in order to obtain the finest optimal DNA sequence with a greater number of matches between base pairs of DNA sequences. Further, binaries of the DNA sequence are made for the aim of machine readability. Six hybrid algorithms are used for the classification to check the performance of these proposed hybrid models. The performance of the proposed models is compared with the other algorithms like BatANN, BatBP, BatGDANN, and BatGDBP in term of MSE and accuracy. From the simulations results it is shown that the proposed LFBatANN and LFBatBP algorithms perform better compared to other hybrid models.
Chapter Preview
Top

1. Introduction

In the field biological sciences the analysis of humans DNA is a key factor and it is essential to know and understand about the DNA, and its functionality because DNA having all the genetic information related to the functioning and reproduction of an organism (Nguyen et al., 2016). It is the genetic material of the cell (Chao, 2006). DNA is fundamentally made up of four types of similar chemicals called Adenine, Guanine. Thiamine, and Cytosine which are repeated millions and billions of times in the genome, called nucleotides or base pairs of the DNA sequence. Adenine makes a bond with Thiamine and Guanine made bond with cytosine (Chao, 2006). In order to understand and decode the biological information a new field came into being called Bioinformatics (Hapudeniya, 2010). Bioinformatics is a newly evolving research area in the 21st century, which combines numerous fields like biology, Mathematics, computer science and statistics etc. Problem in the field of Bioinformatics is hard because the ratio of data in Bioinformatics is growing exponentially (Hapudeniya, 2010). For extracting of knowledge of the huge amount of biological data to various advanced computer technologies, algorithms are needed to be used (Hapudeniya, 2010). In this regard various statistical and computational methods are attempted, Data mining methods like rule learning (RL), Naïve Bayes (NB), nonlinear integral classifier (NIC) are used for DNA sequence classification (Nurul Amerah Kassim1, 2017). The decision tree is used for the classification of DNA sequence (Tansim, 2018a). The traditional statistical and data mining techniques for the classification of DNA sequence classification having limitations with respect to accuracy. In order to solve the problem of low accuracy, advanced computational methods like machine learning, and hybrid methods with neural network are used for DNA sequence classification (Nurul Amerah Kassim1, 2017). An artificial neural network is a computational model (Wu-Catherine, McLarty, & biochemistry, 2000). The concept of neural networks is primarily taken from the biological neural system. Artificial neural network mimics the human brain, which is made up of small units called neurons. Each neuron has its cell body few short dendrites and single elongated axon (Hapudeniya, 2010). Numerous researchers work in deep neural network for DNA and proteins problems (Eickholt & Cheng, 2013). Various nature inspired metaheuristic optimization techniques are also used in the field of Bioinformatics to solve problems like cuckoo search methods are used for multiple DNA sequence alignment, which is one of the core issues in the field of Bioinformatics (Kartous, Layeb, & Chikhi, 2014). Therefore, this research proposed a new hybrid metaheuristic method levy flight Bat algorithm for the classification of insulin DNA sequences of a healthy homosephian (Human). In the proposed model Bat algorithm are hybrid with Levy flight and artificial neural network and Back propagation neural network in order to improve the accuracy and explains the role of optimization techniques with neural networks.

The remaining section of the paper is organized as follows. Section 2 will discuss the background. While section 3 will explain the methods and material, section 4 will explain proposed algorithm. And similarly, the next section 5 will elaborate the result and discussion. Finlay section 6 will conclude the results respectively.

Key Terms in this Chapter

Mean Square Error: In statistics the mean square error initials are (MSE) of an estimator (it is a procedure for estimating an unobserved quantity) which has the ability to measure the average of all the square of errors. It is used to verify the accuracy in the classification results.

Optimization: the process in which there are various alternatives and the selection of best alternatives among them all is an optimization.

Artificial Neural Network: a computational model and copy the way human brains works. There are units in the ANN called neurons these units are connected to other by link and every link is associated with a weight.

Accuracy: the parameter for evaluating the performance of the machine or models which finds that how much the measured results are near to the actual value. Accuracy calculate all the correct prediction observation divided by the total observation number or actual number.

Bioinformatics: The field of bioinformatics is an interdisciplinary field in which different fields like computer science, biology, math’s are combined and used for the analysis and processing of the biological sequences like DNA, RNA, protein, etc.

Machine Learning: Machine learning is the sub field of Artificial intelligence which is a huge and multipurpose field in the modern technological world. Machine learning is related with the development and design of the computational system that can adopt themselves and learn. In machine learning the computational system learns on the basis of the training data.

Classification: Classification is a technique in which the data are grouped into a given number of classes on the basis of some similarity and constraints. The main aim of the classification technique is to shrink the measure of the error.

Metaheuristics: It is a type of approximate algorithm and is composed of two Latin words meta and heuristic, meta means upper limit and heuristic signifies the art of determining new approaches. It is designed on the map of heuristic and leads to an optimal solution

Back Propagation Neural Network: the most famous supervised learning artificial neural network algorithm presented by Rumelhart Hinton and Williams in 1986 mostly used to train multi-layer perceptrons. It is an algorithm which is used for optimization and applied to the Artificial Neural Network (ANN) to accelerate the network convergence to global optima during training process. Like ANN BPNN composed of an input layer, one or more hidden layers and an output layer of neurons.

Sequence Classification: is the technique in which various biological sequences are classifying into their respective classes on the basis of some similarities and constraints.

Bat Algorithm: It is a nature or bioinspired algorithm working on the behavior of bats. Bat algorithm is working on the basis of echolocation behavior of the bats.

Levy Flight: It is a random walk which having the probability distribution with heavy tailed. Levy flight is or levy motion is a category of non-Gaussian random process whose random walks are attracted from the levy stable distribution.

Complete Chapter List

Search this Book:
Reset