DNA Fragment Assembly Using Multi-Objective Genetic Algorithms

DNA Fragment Assembly Using Multi-Objective Genetic Algorithms

Manisha Rathee, T. V. Vijay Kumar
Copyright: © 2014 |Pages: 25
DOI: 10.4018/ijaec.2014070105
(Individual Articles)
No Current Special Offers


DNA Fragment Assembly Problem (FAP) is concerned with the reconstruction of the target DNA, using the several hundreds (or thousands) of sequenced fragments, by identifying the right order and orientation of each fragment in the layout. Several algorithms have been proposed for solving FAP. Most of these have solely dwelt on the single objective of maximizing the sum of the overlaps between adjacent fragments in order to optimize the fragment layout. This paper aims to formulate this FAP as a bi-objective optimization problem, with the two objectives being the maximization of the overlap between the adjacent fragments and the minimization of the overlap between the distant fragments. Moreover, since there is greater desirability for having lesser number of contigs, FAP becomes a tri-objective optimization problem where the minimization of the number of contigs becomes the additional objective. These problems were solved using the multi-objective genetic algorithm NSGA-II. The experimental results show that the NSGA-II-based Bi-Objective Fragment Assembly Algorithm (BOFAA) and the Tri-Objective Fragment Assembly Algorithm (TOFAA) are able to produce better quality layouts than those generated by the GA-based Single Objective Fragment Assembly Algorithm (SOFAA). Further, the layouts produced by TOFAA are also comparatively better than those produced using BOFAA.
Article Preview

1. Introduction

DNA fragment assembly problem (FAP) is concerned with reconstructing the target DNA from several hundred (or even thousands) of sequenced fragments by finding the right order and orientation of each fragment in the fragment ordering sequence (Meksangsouy & Chaiyaratana, 2003). DNA fragment assembly problem is an important and a very complex problem in DNA sequencing. DNA sequencing techniques are used for obtaining the complete sequence of bases in a genome. Strands of DNA encode the hereditary information of an organism and collection of all these strands makes up the genome of that organism (Watson & Berry, 2003). The base sequences in a genome determine the body structure, functions and protein formation of a living being. Therefore obtaining and understanding a genome sequence is an important task to understand the functioning as well as malfunctioning of living beings (Kikuchi & Chakraborty, 2006). Numerous techniques are available for sequencing the DNA but none of them is able to read an entire genome at once, not even more than 1000 bases. However, the genome sequences are very large from a few thousand bases for small viruses to more than 3×109 bases for human. Genomes of wheat and lily are even larger than human genome containing 1.7×1010 base pairs and 1.2×1011 base pairs respectively (Kikuchi & Chakraborty, 2006). Decoding these genomes, though important, is very complex. Shotgun sequencing technique is most popular for overcoming the limitations of DNA sequencing methods and sequencing long strands of DNA, including entire genomes (Setubal & Meidanis, 1997). In this method, firstly, multiple copies of the target DNA are generated and individual DNA strands are randomly broken into a number of smaller fragments, which are short enough for sequencing by any of the sequencing techniques (Dorronsoro, et al., 2008). The sequenced fragments are combined back to get the original sequence as the genomic and phylogenetic research activities require the sequences of whole genome. But as the fragments are randomly generated the information about the fragment ordering on the parent strand or the strand to which a particular fragment belongs is lost thereby resulting in the DNA fragment assembly problem (Meksangsouy & Chaiyaratana, 2003). The problem is growing in importance and complexity, as more research centers embark on sequencing new genomes. Exact solutions are very difficult to obtain, as DNA fragment assembly is a complex combinatorial optimization problem (NP-Hard) (Medvedev, et al., 2007). The complexity is due to high dimensionality search space. For k number of fragments, in absence of noise, there are 2k×k! possible solutions in worst case (Kubalik, et al., 2010).

The development of assembly algorithms is closely related to the development of sequencing technologies. Due to rapid advancements in sequencing techniques, developers of assembly algorithms are continually challenged by larger datasets and novel applications. Therefore development of efficient and powerful assembly algorithms continues to be an area of critical research (Miller, et al., 2010). Since the assemblies generated from short read sequencing data are highly fragmented (Pop, 2009), the future is expected to lead back to OLC for de-novo assembly of long read sequencing data. De novo assembly of long reads is carried out in this paper. The fragment assembly problem dealt by Parsons et al. (1993) considers an objective function minimize F2 for determining a fragment layout, where the aim is to minimize the sum of overlaps between all pairs of fragments in a layout. This paper aims to formulate this as a bi-objective optimization problem, where one of the objectives is to maximize the sum of overlap between adjacent fragments and the other objective is to minimize the sum of overlap between distant fragments. This bi-objective optimization problem is solved using multi-objective genetic algorithm technique NSGA-II (Deb, et al., 2002).

Complete Article List

Search this Journal:
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing