Crow-ENN: An Optimized Elman Neural Network with Crow Search Algorithm for Leukemia DNA Sequence Classification

Crow-ENN: An Optimized Elman Neural Network with Crow Search Algorithm for Leukemia DNA Sequence Classification

Rehan Ullah (The University of Agriculture, Peshawar, Pakistan), Abdullah Khan (The University of Agriculture, Peshawar, Pakistan), Syed Bakhtawar Shah Abid (The University of Agriculture, Peshawar, Pakistan), Siyab Khan (The University of Agriculture, Peshawar, Pakistan), Said Khalid Shah (Department of Computer Science, University of Science and Technology, Bannu, Pakistan) and Maria Ali (The University of Agriculture, Peshawar, Pakistan)
Copyright: © 2020 |Pages: 41
DOI: 10.4018/978-1-7998-2521-0.ch009

Abstract

DNA sequence classification is one of the main research activities in bioinformatics on which, many researchers have worked and are working on it. In bioinformatics, machine learning can be applied for the analysis of genomic sequences like the classification of DNA sequences, comparison of DNA sequences. This article proposes a new hybrid meta-heuristic model called Crow-ENN for leukemia DNA sequences classification. The proposed algorithm is the combination of the Crow Search Algorithm (CSA) and the Elman Neural Network (ENN). DNA sequences of Leukemia are used to train and test the proposed hybrid model. Five other comparable models i.e. Crow-ANN, Crow-BPNN, ANN, BPNN and ENN are also trained and tested on these DNA sequences. The performance of models is evaluated in terms of accuracy and MSE. The overall simulation results show that the proposed model has outperformed all the other five comparable models by attaining the highest accuracy of over 99%. This model may also be used for other classification problems in different fields because it can achieve promising results.
Chapter Preview
Top

1. Introduction

In the past, the speed of producing and sharing scientific knowledge was never been so fast as compared to the present era. New disciplines are being raised by combining different fields of science. One such newly arisen field is bioinformatics, which uses statistics, mathematics and computer science in molecular biology to store, analyze and retrieve biological data. Bioinformatics is growing very fast and it has made itself a basic part of any biological research work. Bioinformatics can serve a biologist to excerpt meaningful information from biological data using different kind of web or computer-based tools, most of which are available freely (Mehmood et al., 2014). Among all these computational techniques machine learning is the most common procedure for analyzing data in the form of protein and DNA sequences. Machine learning is a subfield of AI, which is concerned with designing and development of computer algorithms which get improved with experience. The field of machine learning makes computers capable to aid humans in analyzing complex and large problems. In bioinformatics, machine learning can have applied for the analysis of genomic sequences like the classification of DNA sequences, comparison of DNA sequences, Identification of Unknown DNA sequences etc. Supervised and Unsupervised learning are two broad methodologies which are used commonly in machine learning. (Libbrecht et al., 2015). Classification is kind of supervised machine learning which is used to classify every element in a dataset into one of the predefined set of groups or classes. Classification is a function of data mining which assigns elements/items in a collection/dataset to some target classes or categories based on some similarities. Classification is aimed to accurately predict a target group or class for every item in a dataset. There are many techniques used for classification like Support Vector Machine (SVM), Decision Trees, Naive Bayes Classification, Artificial Neural Networks (ANN), Bayesian Networks, etc. (Kesavaraj et al., 2013).

This research has combined two machine learning algorithms namely Crow Search Algorithm (CSA) and Elman Neural Network (ENN). The simple ENN has the problem of being stuck in the local minima. It was not able to reach to a global optimum. So, by merging CSA with ENN, this problem is solved.

This research aims to construct a hybrid technique by combining Crow Search Optimization Algorithm (CSA) with Elman neural network (ENN) for leukemia DNA sequences classification. The proposed model is called Crow-ENN. The objective of the proposed hybrid model to construct the proposed optimized machine learning classification model for Leukemia DNA sequences classification and the performance evaluation of the proposed hybrid algorithm by comparing its Mean Square Error (MSE) and accuracy with the existing models.

This study is aimed to classify Leukemia DNA sequences using the proposed hybrid Crow-ENN model. Datasets of Leukemia DNA sequences are taken from NCBI (National Center for Biotechnology Information) database. Two measures are used for the performance evaluation of the proposed model, which are; MSE and Accuracy.

Key Terms in this Chapter

Crow-BPNN: It is a hybrid model that is developed by combining the Crow Search Algorithm and Back Propagation Neural Network (BPNN).

Elman Neural Network: Elman Neural Network (ENN) is a feedback neural network that is enhanced by Elman in 1990. ENN is based on the study of the backpropagation neural network (BPNN). The physical layout of the Elman neural network is divided broadly into 4 layers: the input layer, the hidden layer, the Undertake layer, and the output layer. The purpose of undertake layer is to memorize the hidden layer output. As it is based on a backpropagation neural network, the output of the hidden layer connects with its input via the delay and memory of undertake layer.

MSE: Mean Squared Error (MSE) is the average of the squared difference between the predicted value of an estimator and the actual values.

Crow-ENN: This is the proposed hybrid meta-heuristic model which is the combination of an Optimization technique called Crow Search Algorithm and a type of Neural Network called Elman Neural Network.

Machine Learning: Machine learning is a subfield of AI, which is concerned with designing and development of computer algorithms which get improved with experience.

BPNN: The BPNN is a multilayer feed-forward neural network that is trained according to an error backpropagation algorithm. The learning process of the backpropagation algorithm is done in two steps, Operating signal forward propagation, and Error signal backpropagation.

Meta-Heuristic: A meta-heuristic is a generic or higher-level heuristic that is more general in problem-solving. Meta-heuristic computing is adaptive computing that applies general heuristic rules in solving a category of computational problems.

Classification: Classification is kind of supervised machine learning which is used to classify every element in a dataset into one of the predefined set of groups or classes based on some similarities or homology. There are many machine learning techniques used for classification like Decision Trees, Support Vector Machine, Artificial Neural Networks, and Bayesian Classification etc.

Leukemia: Leukemia is cancer of the body's blood-forming tissues, including the bone marrow and the lymphatic system. Many types of leukemia exist. Some forms of leukemia are more common in children. Other forms of leukemia occur mostly in adults. Leukemia usually involves white blood cells.

DNA: DNA stands for “Deoxyribonucleic Acid” is a carrier molecule of genetic information. It can be found in the nucleus of any cell. DNA contains all the information which are necessary for the duplication of life. The DNA structure is like a double helix comprising of two long strands. Each strand is made up of four types of nucleotides: Adenine (A), Cytosine (C), Guanine (G), and Thymine (T).

Bioinformatics: Bioinformatics is a field of science, which uses statistics, mathematics and computer science in molecular biology for storing, analyzing and retrieving biological data.

Crow-ANN: It is a hybrid model that is developed by combining the Crow Search Algorithm and Artificial Neural Network (ANN).

Accuracy: Accuracy is the most native measure of performance evaluation. It is concerned with the closeness of an outcome to the true or actual value. It is a great measure. The high accuracy of a model means that the model is best.

Crow Search Algorithm: Crow Search Algorithm (CSA) is a bio-inspired meta-heuristic optimizer which simulates the intelligent behavior of crows. CSA was proposed by Alireza Askarzadeh in 2016. It is a population-based algorithm that works on the following four principles: Crows live in the form of groups or flocks, Crows memorize the position of their food hiding places, Crows follow each other to do thievery, Crows protect their caches from being pilfered by a probability. CSA has been developed based on intelligent behaviors and used them as an optimization process.

Complete Chapter List

Search this Book:
Reset