Implementation of Recursive Brute Force for Solving Motif Finding Problem on Multi-Core

Implementation of Recursive Brute Force for Solving Motif Finding Problem on Multi-Core

Marwa A. Radad (Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt), Nawal A. El-Fishawy (Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt) and Hossam M. Faheem (Department of Computer Systems, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt)
DOI: 10.4018/ijsbbt.2013070101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Motif is an over-represented pattern in biological sequence. Motif discovery is a major challenge in bioinformatics. Pattern mismatches phenomena makes motif mining very difficult. Brute Force approaches take exponential time with motif length to solve this problem. In this paper, the authors discuss a Recursive-Brute Force algorithm. Its average case time complexity is exponential with the allowed mutations instead of the motif length. Modern Multi-Core architecture revolution encourages us to parallelize our algorithm. We implement the algorithm using two different approaches. A multi-threaded version (OMP-RBF) is implemented using OpenMP. OMP-RBF suffers from a serious performance degradation due to the heap contention problem. The authors have investigated different solutions to solve the heap contention problem. The second implementation is based on MPI that is called MPI-RBF. The efficient handling of the data locality boost the scalability of the MPI-RBF. The authors prove that MPI approach outperforms OpenMP in such computationally-intensive, memory-intensive, and communication-less problem.
Article Preview

Introduction

Biologists are interested in understanding gene interaction and gene regulation. They have determined that special proteins known as transcription factors are controlled when genes are turned on and off. Transcription factors bind to relatively short sequences in the region surrounding a gene which are called transcription factors binding sites (TFBS) or simply motifs. As a result, finding these motifs is a very important task for decoding a genome (Gopal, Haake, Jones & Tymann, 2009).

The literature on the topic of DNA motif finding is extensive. Specialized surveys for existing motif finders are documented (Das & Dai, 2007; Sandve & Drabløs, 2006). In brief words, the algorithms fall into two major groups: word-based algorithms which represent the motif as string, and probabilistic model algorithms witch represent the motif as a Position Weight Matrix (PWM). Word-based algorithms include enumeration methods like Brute Force (Jones & Pevzner, 2004), FLAME (Floratou, Tata & Patel, 2010) (that uses suffix tree), and MITRA (Eskin & Pevzner, 2002) (that uses mismatch tree). Graph algorithms are among other implementation of word-based approach, WINNOWER (Pevzner & Sze, 2000) is an example. Many algorithms of this approach can guarantee finding the optimal solution and are considered exact algorithms. On the other hand, most of the probabilistic model algorithms fall under the approximate category because they employ local search techniques. Probabilistic model algorithms include expectation maximization like MEME(Bailey, Williams, Misleh, & Li, 2006), and Gibbs Sampling like Gibbs Sampler (Lawrence et al., 1993).

The goal of motif finding is to find an unknown motif sequence with approximate occurrences at unknown positions in a sample of DNA sequences. This makes motif finding considered as a computationally intensive problem. Parallel computation reduces the execution time by running multiple processes concurrently. Many of motif finding algorithms are parallelized using different approaches. Special hardware devices like VHDL, FPGAs, GPUs, and COPACOBANAs are efficiently utilized to solve the motif finding problem (Faheem, 2010(AIA); Schröder, Wienbrandt, Pfeiffer & Schimmler, 2008; Yu & Xu, 2009). In fact, these solutions require special hardware and specialists who are able to design such systems.

Now, the era of Multi-core architecture has begun. Parallel programming is introduced to solve computationally intensive problem like motif finding by running multiple threads concurrently on the available cores. ParaMEME (Grundy, Bailey & Elkan, 1996) is a parallel version of MEME algorithm that provides efficient performance when it is ported to San Diego Supercomputer Center’s 400-node Intel Paragon XP/S and Pittsburgh Supercomputer Center’s T3D. This architecture development encourages us to parallelize our efficient exact algorithm called Recursive-Brute Force (R-BF) in order to benefit from this technology.

While R-BF does not enhance the worst case of the brute force approach, the average case complexity reflexes the performance enhancement that achieved by the R-BF. It requires time that exponential with the allowed mutations (d) rather than the motif length (l) like the previous versions of the Brute Force. We consider two different paradigms for multi-core programming, OpenMP (http://www.mpi-forum.org). While MPI is normally used on large distributed computing systems, it can be also used on a multicore architecture. Many researches try to answer the question: How does the performance of MPI compare to OpenMP on a multicore system? (Eadline, 2007; Jost, Jin, an Mey & Hatay, 2003; Mallón etal., 2009).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 5: 2 Issues (2017): Forthcoming, Available for Pre-Order
Volume 4: 2 Issues (2016): Forthcoming, Available for Pre-Order
Volume 3: 1 Issue (2015)
Volume 2: 4 Issues (2013)
Volume 1: 4 Issues (2012)
View Complete Journal Contents Listing