High Performance BLAST Over the Grid

High Performance BLAST Over the Grid

Vincent Breton (CNRS Clermont-Ferrand, France), Eddy Caron (Universite de Lyon, France), Frederic Desprez (INRIA Universite de Lyon, France) and Gael Le Mahec (CNRS Clermond-Ferrand, France)
DOI: 10.4018/978-1-60566-374-6.ch007
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

As grids become more and more attractive for solving complex problems with high computational and storage requirements, bioinformatics starts to be ported on large scale platforms. The BLAST kernel, one of the main cornerstone of high performance genomics, was one the first application ported on such platform. However, if a simple parallelization was enough for the first proof of concept, its use in production platform needed more optimized algorithms. In this chapter, we review existing parallelization and “gridification” approaches as well as related issues such as data management and replication, and a case study using the DIET middleware over the Grid’5000 experimental platform.
Chapter Preview
Top

Background

Several projects around the world aim at setting Bioinformatics Grids. These projects are built upon de facto standards like Globus (Globus. http://www.globus.org) or Condor (Frey, Tannenbaum, Livny, Foster, & Tuecke, 2002).

Key Terms in this Chapter

Sequences Alignment: Sequences alignments are used to underline the similarities of two different sequences. In biology, the alignments between two proteins or DNA sequences can traduce functional or evolutionary relationships.

Diet: DIET consists of a set of elements that can be used together to build applications using the Grid-RPC paradigm. This middleware is able to find an appropriate server according to the information given in the client’s request, the performance of the target platform and the local availability of data stored during previous computations

Biological Database: Library of annotated biological data collected from experiments or computational analyses, which are marshalled by type, by nature or by origin.

Grid Middleware: Set of services used to federate a computing Grid resources, taking into account, the security, the data management, the tasks submissions and the results retrieval transparently from the users point of view.

DAGDA: The new data manager for the DIET middleware which allows data explicit or implicit replications and advanced data management on the Grid. It was designed to be backward compatible with previously developed applications for DIET which benefit transparently of the data replications.

BLAST: Basic Local Alignment Search Tool is a bioinformatics application designed to find regions of local similarities between biological sequences like protein sequences or DNA sequences.

Data Replication: The way to copy and distribute the data among the storage resources.

Tasks Scheduling: It refers to the way processes are assigned priorities in a priority queue. This assignment is carried out by software known as a scheduler. It consists of mainly, CPU utilization - to keep the CPU as busy as possible.

Complete Chapter List

Search this Book:
Reset