Pe-DFA: Penguins Search Optimisation Algorithm for DNA Fragment Assembly

Pe-DFA: Penguins Search Optimisation Algorithm for DNA Fragment Assembly

Youcef Gheraibia (Department of Computer Science and Mathematics, University of Mohammed Cherif Messaadia, Souk Ahras, Algeria and Department of Computer Science, University of Badji Mokhtar, Annaba, Algeria), Abdelouahab Moussaoui (Department of Computer Science, University of Ferhat Abaas, Setif, Algeria), Sohag Kabir (Department of Computer Science, University of Hull, Hull, UK) and Smaine Mazouzi (Department of Computer Science, University of Skikda, Skikda, Algeria)
Copyright: © 2016 |Pages: 13
DOI: 10.4018/IJAMC.2016040104


DNA Fragment Assembly (DFA) is a process of finding the best order and orientation of a set of DNA fragments to reconstruct the original DNA sequence from them. As it has to consider all possible combinations among the DNA fragments, it is considered as a combinatorial optimisation problem. This paper presents a method showing the use of Penguins Search Optimisation Algorithm (PeSOA) for DNA fragment assembly problem. Penguins search optimisation is a nature inspired metaheuristic algorithm based on the collaborative hunting strategy of penguins. The approach starts its operation by generating a set of random population. After that, the population is divided into several groups, and each group contains a set of active fragments in which the penguins concentrate on the search process. The search process of the penguin optimisation algorithm is controlled by the oxygen reserve of penguins. During the search process each penguin shares its best found solution with other penguins to quickly converge to the global optimum. In this paper, the authors adapted the original PeSOA algorithm to obtain a new algorithm structure for DNA assembly problem. The effectiveness of the proposed approach has been verified by applying it on the well-known benchmarks for the DNA assembly problem. The results show that the proposed method performed well compared to the most used DNA fragment assembly methods.
Article Preview

1. Introduction

Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic information in all organisms including humans. The DNA is represented as a long file made up of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). These bases are 99 percent similar for all humans and usually organisms of the same classes have high similarity in their DNA (Berk et al., 2000). A genome is a stretch of DNA that encodes a polypeptide (protein), which is a set of amino acids bound together in a specific order. The chemical bases are up with each other in the way that base A is always paired with base T, and base C is always with base G. This sequence consists of nucleotides bound together, which are interpreted by the cellular machinery in groups of three, called triplets (Cai et al., 2012).

DNA fragment assembly is a NP-hard problem in computational biology (Pevzner et al., 2001). The human DNA contains about 3 billion bases and cannot be read at once. Experimental techniques have been used to solve this problem by cutting the whole sequence at random positions to a set of fragments. The main objective of DNA fragment assembly is to construct the original DNA sequence from the small fragments by finding the best order of these fragments to maximise overlapping scores between each two consecutive fragments (Bankevich et al., 2012).

Several optimisation-based methods have been proposed to deal with the DFA problem (Pevzner et al, 2001; Caserta et al,. 2014; Chang et al,. 2011; Bocicor et al,. 2011). The main objective of these methods is to construct the original DNA sequence based on the small DNA fragments by maximising the overlapping scores between consecutive sequences. Greedy methods were firstly applied to deal with the DNA fragments assembly problem by Staden (1980), afterwards, metaheuristics were widely used to solve the same problem. Gheraibia et al (2013) proposed Penguins Search Optimisation Algorithm (PeSOA) based on the collaborative hunting strategy of penguins. This metaheuristic algorithm is different from the traditional algorithms in a sense that it controls the expenditure of time on non-useful search area. Penguins hunt fishes in groups to gain energy and synchronise their dives to optimise the overall expenditure of energy in the search process. The PeSOA algorithm starts its operation by generating an initial population. The population is then divided into groups and each group starts searching in a specific location by fixing other parts of the solution. In order to save energy, penguins use the energy reserve to decide whether to hunt or not in a given area. The energy level of a penguin is determined by its oxygen reserve and it is used also as an acceleration coefficient to ensure a good intensification strategy for the global optimum. The number of penguins in a group can vary depending on the quantity of fishes eaten by the penguins of this group which represents the improvement of the objective function of the penguins.

In this paper, a penguins search optimisation algorithm based approach is proposed to deal with the DNA fragment assembly problem. The purpose of the proposed approach is to find the best combination (order) of DNA fragments which maximise the overlap between each two consecutive fragments. To do so penguins are employed to find local solutions where each group of penguins concentrate on a set of fragment and search for solution. The final solution is constructed based on the best local solutions given by each group of penguins. Consequently, solving the DNA fragment assembly problem needs the improvement of total overlapping between fragments and reducing the computation time. To evaluate the efficiency of the proposed method, several experiments have been performed with different benchmarks generated by GenFrag (Engle et al., 1993) and DNAgen (Guillermo et al., 2013) tools. The experiments show that the proposed approach improves the total overlapping score by exploring the search spaces in an efficient way.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2019): 1 Released, 3 Forthcoming
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing