The LIBI Grid Platform for Bioinformatics

The LIBI Grid Platform for Bioinformatics

DOI: 10.4018/978-1-60566-374-6.ch029
OnDemand PDF Download:
List Price: $37.50


The LIBI project (International Laboratory of BioInformatics), which started in 2005 and will end in 2009, was initiated with the aim of setting up an advanced bioinformatics and computational biology laboratory, focusing on basic and applied research in modern biology and biotechnologies. One of the goals of this project has been the development of a Grid Problem Solving Environment, built on top of EGEE, DEISA and SPACI infrastructures, to allow the submission and monitoring of jobs mapped to complex experiments in bioinformatics. In this work we describe the architecture of this environment and describe several case studies and related results which have been obtained using it.
Chapter Preview


A key requirement, considered during the design phase of the LIBI laboratory, has been the consideration that bioinformatics applications are naturally distributed, because experimental data and biological databases are themselves usually distributed. Also many experiments require huge computing power, owing to the large size of data sets and the complexity of processing, and may need to access heterogeneous data, where heterogeneity is multifaceted (data format, access policy, distribution, etc.), and require a secure infrastructure to protect and secure the access to private data owned by different organizations.

The Problem Solving Environment (PSE) (Houstis, 1997) is an approach and a technology that can fulfill such bioinformatics requirements. The PSE can be used for the definition and composition of complex applications, hiding programming and configuration details to the user that can concentrate instead only on the specific biological problem. Moreover, computational grids (Foster, 1999) can be used for building geographically distributed collaborative problem solving environments and grid-aware PSEs (Laszewski, 2001) can search and use dispersed high performance computing, networking and data resources. In this work, the PSE solution has been chosen as the integration platform for bioinformatic tools and data sources.

Key Terms in this Chapter

High Throughput Computing (HTC): It is a computer science term to describe the use of many computing resources over long periods of time to accomplish a computational problem. A typical HTC problem consists of many loosely-coupled tasks that can be executed in parallel.

Grid Problem Solving Environment (Grid-PSE): A PSE is a computer system that provides all the computational facilities needed to solve a target class of problems. These features include advanced solution methods, automatic and semiautomatic selection of solution methods, and ways to easily incorporate novel solution methods. Moreover, PSEs use the language of the target class of problems, so users can run them without specialized knowledge of the underlying computer hardware or software. By exploiting modern technologies such as interactive color graphics, powerful processors, and networks of specialized services, PSEs can track extended problem solving tasks and allow users to review them easily (Houstis, 1997). A Grid PSE is a grid-based PSE that integrates heterogeneous components into an environment providing transparent access to distributed resources, collaborative modeling and simulation, and grid portals.

Grid Resource Management: It refers to all of the actions to be considered in order to efficiently use the available computational resources in a distributed environment. The grid resource management must take into account the heterogeneity of the resource in terms of processors, storage and network performance.

Interoperability: It is a property referring to the ability of diverse systems and organizations to work together (inter-operate). Mainly referred to the basic grid services like resource management, information management, data management, security.

Grid Data Access Service: The Grid-DBMS virtualized access interface to Grid-Databases.

High Performance Computing (HPC): It uses supercomputers and computer clusters to solve advanced computing problems characterized by intensive computing requirements and consist of tightly-coupled tasks.

Grid Workflow Management System: A system that defines, creates and manages the execution of workflows through the use of grid computing technologies, running on one or more workflow engines, which is able to interpret the process definition, interact with workflow participants and, where required, integrate distributed resources and legacy software modules.

Grid-DBMS: A distributed system which automatically, transparently and dynamically manages Data Resources, according to the Grid state, in order to maintain a desired performance level. It must offer an efficient, robust, intelligent, transparent, uniform access to Grid-Databases by means of a Grid Data Access Service (Grid-DAS) interface.

Data Federation: It refers to platforms able to provide virtualized federation integration of multiple disparate heterogeneous information sources enabling applications to access and integrate diverse data and content sources as if they were a single resource, regardless of where the information resides, while retaining the autonomy and integrity of the data and content sources.

Computational Grid: By providing scalable, secure, high-performance mechanisms for discovering and negotiating access to remote resources, the Grid promises to make it possible for scientific collaborations to share resources on an unprecedented scale, and for geographically distributed groups to work together in ways that were previously impossible (Foster, 1999).

Complete Chapter List

Search this Book: