DNA Fragment Assembly Using Quantum-Inspired Genetic Algorithm

DNA Fragment Assembly Using Quantum-Inspired Genetic Algorithm

Manisha Rathee, Kumar Dilip, Ritu Rathee
Copyright: © 2019 |Pages: 19
DOI: 10.4018/978-1-5225-5832-3.ch005
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

DNA fragment assembly (DFA) is one of the most important and challenging problems in computational biology. DFA problem involves reconstruction of target DNA from several hundred (or thousands) of sequenced fragments by identifying the proper orientation and order of fragments. DFA problem is proved to be a NP-Hard combinatorial optimization problem. Metaheuristic techniques have the capability to handle large search spaces and therefore are well suited to deal with such problems. In this chapter, quantum-inspired genetic algorithm-based DNA fragment assembly (QGFA) approach has been proposed to perform the de novo assembly of DNA fragments using overlap-layout-consensus approach. To assess the efficacy of QGFA, it has been compared genetic algorithm, particle swarm optimization, and ant colony optimization-based metaheuristic approaches for solving DFA problem. Experimental results show that QGFA performs comparatively better (in terms of overlap score obtained and number of contigs produced) than other approaches considered herein.
Chapter Preview
Top

1. Introduction

Understanding the functioning (as well as malfunctioning) of living beings require determination and interpretation of their genome sequences (Kikuchi & Chakraborty, 2006). The genome of an organism is made up of deoxyribonucleic acid (DNA) strands which encode its hereditary information and determine its body structure, functions and protein formation (Watson & Berry, 2003). DNA strands consist of two types of nitrogenous bases namely purines (adenine (A) and guanine (G)) and pyrimidines (cytosine (C) and thymine (T)). DNA has a double helical structure consisting of two strands running anti-parallel to each other and having complementary bases where a purine on one strand is paired with pyrimidine on the other and vice-versa in such a way that A is always paired with T and G is always paired with C (Watson & Berry, 2003; Watson & Crick, 1953). The process of determining the complete sequence of bases in all the strands of DNA (i.e. the genome) is termed as DNA sequencing. The genome sequences are generally very large ranging from few thousand base pairs for small viruses to 978-1-5225-5832-3.ch005.m01 base pairs for humans,978-1-5225-5832-3.ch005.m02 base pairs for wheat and 978-1-5225-5832-3.ch005.m03 base pairs for lily (Kikuchi & Chakraborty, 2006). A number of techniques are available for sequencing but none of the available techniques is capable of reading more than 1000 bases at a time, let alone reading an entire genome at once. This limitation of DNA sequencing methods is overcome by shotgun sequencing where the target DNA is replicated to generate multiple copies which are then randomly broken into a number of smaller fragments so that the fragments are short enough to be sequenced by any of the available sequencing techniques (Dorronsoro, et al., 2008). After sequencing, the sequenced fragments need to be combined back to obtain the original sequence as the whole genome sequence is required for phylogenetic and genomic research activities. But, as the fragments were generated randomly, the information about ordering of the fragments on the parent strand or the strand to which a particular fragment belongs is lost thereby resulting in the DNA fragment assembly problem (Meksangsouy & Chaiyaratana, 2003). DFA problem involves reconstruction of target DNA from several hundred (or thousands) of sequenced fragments by identifying the proper orientation and order of the sequenced fragments (Meksangsouy & Chaiyaratana, 2003). The sequenced fragments are called as reads and are provided as input to the assembly procedure.

DFA is proved to a NP-Hard combinatorial optimization problem (Medvedev, et al., 2007). The complexity arises due to a very large search space as there are 978-1-5225-5832-3.ch005.m04 possible solutions in worst case for a set containing 978-1-5225-5832-3.ch005.m05fragments (Kubalik, et al., 2010). Therefore obtaining exact solutions using traditional optimization techniques is not possible. Metaheuristic techniques have the capability to handle large search spaces and therefore well suited to solve the hard optimization problems. In this chapter, Quantum inspired genetic algorithm (QIGA) has been adapted for performing the de novo assembly of DNA fragments using overlap-layout-consensus approach. QIGA is a relatively recent metaheuristic technique which blends the principals of quantum computing with the concepts of genetic algorithm (Han & Kim, 2002). Due to the parallel processing capabilities of QIGA, it is capable of providing better solutions (in terms of diversity, quality and convergence) with a smaller population size. Also, QIGA has an edge over other metaheuristic techniques as very less number of parameters needs to be adjusted in case of QIGA.

Complete Chapter List

Search this Book:
Reset