High-Throughput Data Analysis of Proteomic Mass Spectra on the SwissBioGrid

High-Throughput Data Analysis of Proteomic Mass Spectra on the SwissBioGrid

Andreas Quandt (Swiss Institute of Bioinformatics, Switzerland), Sergio Maffioletti (University of Lugano, Switzerland), Cesare Pautasso (Swiss Institute of Bioinformatics, Switzerland), Heinz Stockinger (Swiss Institute of Bioinformatics, Switzerland) and Frederique Lisacek (ETH Zurich, Switzerla)
DOI: 10.4018/978-1-60566-374-6.ch012
OnDemand PDF Download:


Proteomics is currently one of the most promising fields in bioinformatics as it provides important insights into the protein function of organisms. Mass spectrometry is one of the techniques to study the proteome, and several software tools exist for this purpose. The authors provide an extendable software platform called swissPIT that combines different existing tools and exploits Grid infrastructures to speed up the data analysis process for the proteomics pipeline.
Chapter Preview


In the past, biology was not closely linked to computer science but the rising of molecular biology introduced a major change, and nowadays, biological experiments produce terabytes of data so that data analysis requires large infrastructures. Grids are a promising solution to this issue. Grids enable job distribution, task parallelization as well as data and results sharing between geographically scattered groups. Grid computing has been used in scientific domains for several years. Whereas physics (and in particular high energy physics) was one of the main early drivers, other scientific domains with Bioinformatics in particular, started to leverage Grid computing technologies for certain computing and/or data intensive applications. However, using this technology is far from trivial, Grids being heterogeneous, geographically distributed resources. Grids are more complicated to maintain, the organization and monitoring of the computation steps and the secure storage and distribution of data require additional knowledge from the user. These computing related issues may become a burden to non computer-specialists and, therefore, need to be hidden as much as possible from the end-users.

Key Terms in this Chapter

Mass Spectrometry: In the field of proteomics, mass spectrometry is a technique to analyze, identify and characterize proteins. In particular, it measures the mass-to-charge ratio.

Grid Workflow: In general, a workflow can be considered as the automation of a specific process which can further be divided into smaller tasks. A Grid workflow consists of several tasks that need to be executed in a Grid environment but not necessarily on the same computing hardware.

High Performance Computing (HPC): HPC is a particular field in computer science that deals with performance optimization of single applications, usually by running parallel instances on high performance computing clusters or supercomputers.

Proteomics: The large-scale study of proteins, their functions and their structures. It is supposed to complement physical genome research. It can also be defined as the qualitative and quantitative comparison of proteomes under different conditions to further unravel biological processes (http://www.expasy.ch/proteomics_def.html).

High Throughput Computing: In contrast to HPC, high throughput computing does not aim to optimize a single application but several users and applications. In this way, many applications share a computing infrastructure at the same time – in this way the overall throughput of several applications is supposed to be maximized.

Bioinformatics: Comprises the management and the analysis of biological databases.

Grid Job Submission and Execution: Workflows are typically expressed in certain languages and then have to be executed. Often, the entire workflow is called a “job” which needs to be submitted to the Grid and executed on Grid computing resources.

Complete Chapter List

Search this Book:
Editorial Advisory Board
Table of Contents
Tim Clark, Ian Foster
Mario Cannataro
Mario Cannataro
Chapter 1
Mark Olive, Hanene Boussi Rahmouni, Tony Solomonides, Vincent Breton, Nicolas Jacq, Yannick Legre
The principal goal of this chapter is to elucidate the future requirements of healthgrids if they are to become the infrastructure of choice for... Sample PDF
SHARE: A European Healthgrid Roadmap
Chapter 2
Aisha Naseer, Lampros Stergiolas
Adoption of cutting edge technologies in order to facilitate various healthcare operations and tasks is significant. There is a need for health... Sample PDF
Types of Resources and their Discover in HealthGrids
Chapter 3
Khalid Belhajjame, Paolo Missier, Carole Goble
Data provenance is key to understanding and interpreting the results of scientific experiments. This chapter introduces and characterises data... Sample PDF
Data Provenance in Scientific Workflows
Chapter 4
Bartosz Balis, Marian Bubak, Michal Pelczar, Jakub Wach
Provenance tracking is an indispensable element of each e-Science infrastructure for conducting in silico experiments. However, enabling end-users... Sample PDF
Provenance Tracking and End-User Oriented Query Construction
Chapter 5
Yassene Mohammed, Fred Viezens, Frank Dickmann, Juergen Falkner, Thomas Lingner
This chapter describes security and privacy issues within the scope of biomedical Grid Computing. Grid Computing is of rising interest for life... Sample PDF
Data Protection and Data Security Regarding Grid Computing in Biomedical Research
Chapter 6
Moez Ben HajHmida, Antonio Congiusta
Knowledge discovery has become a necessary task in scientific, life sciences, and business fields, both for the growing amount of data being... Sample PDF
Parallel, Distributed, and Grid-Based Data Mining: Algorithms, Systems, and Applications
Chapter 7
Vincent Breton, Eddy Caron, Frederic Desprez, Gael Le Mahec
As grids become more and more attractive for solving complex problems with high computational and storage requirements, bioinformatics starts to be... Sample PDF
High Performance BLAST Over the Grid
Chapter 8
Luciano Milanesi, Ivan Merelli, Gabriele Trombetti
A common ongoing task for Functional Genomics is to compare full organisms’ genome with those of related species, to search in huge database for... Sample PDF
Functional Genomics Applications in GRID
Chapter 9
Bertil Schmidt, Chen Chen, Weiguo Liu, Wayne Mitchell
In this chapter we present PheGeeatHome, a grid-based comparative genomics tool that nominates candidate genes responsible for a given phenotype. A... Sample PDF
PheGeeatHome: A Grid-Based Tool for Comparative Genomics
Chapter 10
Giulia De Sario, Angelica Tulipano, Giacinto Donvito, Giorgio Maggi
The number of fully sequenced genomes increases daily, producing an exponential explosion of the sequence, annotation and metadata databases. Data... Sample PDF
High-Throughput GRID Computing for Life Sciences
Chapter 11
Mario Cannataro, Pietro Hiram Guzzi, Giuseppe Tradigo, Pierangelo Veltri
Recent advances in high throughput technologies analysing biological samples enabled the researchers to collect a huge amount of data. In... Sample PDF
Management and Analysis of Mass Spectrometry Proteomics Data on the Grid
Chapter 12
Andreas Quandt, Sergio Maffioletti, Cesare Pautasso, Heinz Stockinger, Frederique Lisacek
Proteomics is currently one of the most promising fields in bioinformatics as it provides important insights into the protein function of organisms.... Sample PDF
High-Throughput Data Analysis of Proteomic Mass Spectra on the SwissBioGrid
Chapter 13
Fotis Psomopoulos, Pericles Mitkas
The scope of this chapter is the presentation of Data Mining techniques for knowledge extraction in proteomics, taking into account both the... Sample PDF
Data Mining in Proteomics Using Grid Computing
Chapter 14
Maria Mirto, Italo Epicoco, Massimo Cafaro, Sandro Fiore
In this chapter, the ProGenGrid (Proteomics and Genomics Grid) research project, which started in 2004, is described. It is a Grid Problem Solving... Sample PDF
ProGenGrid: A Grid Problem Solving for Bioinformatics
Chapter 15
Qiang Wang, Yunming Ye, Kunqian Yu, Joshua Zhexue Huang
A drug discovery process is aimed to find from a large set of molecules the candidate leads that have strong interaction with the target proteins.... Sample PDF
A Graphical Workflow Modeler for Docking Process in Drug Discovery
Chapter 16
Kaihsu Tai, Mark Sansom
BioSimGrid is a distributed biomolecular simulation database. It is a general-purpose database for trajectories from molecular dynamics simulations.... Sample PDF
BioSimGrid Biomolecular Simulation Database
Chapter 17
Russ Miller, Charles Weeks
Grids represent an emerging technology that allows geographically- and organizationally-distributed resources (e.g., compute systems, data... Sample PDF
Molecular Structure Determination on the Grid
Chapter 18
Ian Greenshields, Gamal El-Sayed
This chapter introduces some aspects of visualization and the grid. Visualization --the art and science of representing data visually-- is now... Sample PDF
Aspects of Visualization and the Grid in a Biomedical Context
Chapter 19
Cecile Germain-Renaud, Vincent Breton, Patrick Clarysse, Bertrand Delhay, Yann Gaudeau, Tristan Glatard, Emmanuel Jeannot, Yannick Legre
Grid technologies and infrastructures can contribute to harnessing the full power of computer-aided image analysis into clinical research and... Sample PDF
Grid Analysis of Radiological Data
Chapter 20
J.R. Bilbao Castro, I. Garcia Fernandez, J. Fernandez
Three-dimensional electron microscopy allows scientists to study biological specimens and to understand how they behave and interact with each other... Sample PDF
Grid Computing in 3D Electron Microscopy Reconstruction
Chapter 21
Francesco Maria Colacino, Maurizio Arabia, Gionata Fragomeni
In the last decades cardiovascular diseases greatly increased worldwide, and bioengineering provided new technologies and cardiovascular prostheses... Sample PDF
Hybrid Mock Circulatory System to Test Cardiovascular Prostheses on the Grid
Chapter 22
Ignacio Blanquer, Vicente Hernandez
Epidemiology constitutes one relevant use case for the adoption of grids for health. It combines challenges that have been traditionally addressed... Sample PDF
Grid Technologies in Epidemiology
Chapter 23
Fabricio Alves Barbosa da Silva, Henrique Fabricio Gagliardi, Eduardo Gallo, Maria Antonia Madope, Virgilio Cavicchioli Neto, Ivan Torres Pisa, Domingos Alves
The authors present in this work a large-scale system for space-time visualization, monitoring, modeling and analysis of epidemic data using a Grid... Sample PDF
IntegraEPI: Epidemiologic Surveillance on the Grid
Chapter 24
David Manset, Frederic Pourraz, Alexey Tsymbal, Jerome Revillard, Konstantin Skaburskas, Richard McClatchey, Ashiq Anjum, Alfonso Rios, Martin Huber
The Health-e-Child project started in January 2006 with the aim of developing a Grid-based healthcare platform for European paediatrics and... Sample PDF
Gridifying Biomedical Applications in the Health-e-Child Project
Chapter 25
Richard Sinnott, Ian Piper
Clinical research is becoming ever more collaborative with multi-centre trials now a common practice. With this in mind, never has it been more... Sample PDF
e-Infrastructures Fostering Multi-Center Collaborative Research into the Intensive Care Management of Patients with Brain Injury
Chapter 26
Tomasz Gubala, Marian Bubak, Peter Sloot
Research environments for modern, cross-disciplinary scientific endeavors have to unite multiple users, with varying levels of expertise and roles... Sample PDF
Semantic Integration for Research Environments
Chapter 27
Marian Bubak, Maciej Malawski, Tomasz Gubala, Marek Kasztelnik, Piotr Nowakowski, Daniel Harezlak
Advanced research in life sciences calls for new information technology solutions to support complex, collaborative computer simulations and result... Sample PDF
Virtual Laboratory for Collaborative Applications
Chapter 28
Sriram Krishnan, Luca Clementi, Zhaohui Ding, Wilfred Li
Grid systems provide mechanisms for single sign-on, and uniform APIs for job submission and data transfer, in order to allow the coupling of... Sample PDF
Leveraging the Power of the Grid with Opal
Chapter 29
The LIBI project (International Laboratory of BioInformatics), which started in 2005 and will end in 2009, was initiated with the aim of setting up... Sample PDF
The LIBI Grid Platform for Bioinformatics
Chapter 30
Piotr Bala, Kim Baldridge, Emilio Benfenati, Mose Casalegno, Uko Maran, Lukasz Miroslaw
This chapter provides an overview of Grid middleware and applications related to biomedical and life sciences disciplines. Various technologies... Sample PDF
UNICORE: A Middleware for Life Sciences Grid
Chapter 31
Livia Torterolo, Luca Corradi, Barbara Canesi, Marco Fato, Roberto Barbera, Salvatore Scifo, Antonio Calanducci
This chapter describes a Grid oriented platform -the Bio Med Portal- as a new tool to promote collaboration and cooperation among scientists and... Sample PDF
A Grid Paradigm for e-Science Applications
Chapter 32
Roberto Barbera, Antonio Calanducci, Juan Manuel Gonzalez Martin, Fancisco Prieto Castrillo, Raul Ramos Pollan, Raul Rubio del Solar, Dorin Tcaci
This chapter presents the gLibrary/DRI (Digital Repositories Infrastructure) platform. The main goal of the platform is to reduce the cost in terms... Sample PDF
gLibrary/DRI: A Grid-Based Platform to Host Muliple Repositories for Digital Content
Chapter 33
Wolfgang Gentzsch
A Grid enables remote, secure access to a set of distributed, networked computing and data resources. Clouds are a natural next step of Grids... Sample PDF
Porting Applications to Grids and Clouds
Chapter 34
Agostino Forestiero, Carlo Mastroianni, Fausto Pupo, Giandomenico Spezzano
This chapter proposes a bio-inspired approach for the construction of a self-organizing Grid information system. A dissemination protocol exploits... Sample PDF
Evaluating a Bio-Inspired Approach for the Design of a Grid Information System: The SO-Grid Portal
Chapter 35
Heinz Stockinger, Alexander Auch, Markus Goeker, Jan Meier-Kolthoff, Alexandros Stamatakis
Phylogenetic data analysis represents an extremely compute-intensive area of Bioinformatics and thus requires high-performance technologies. Another... Sample PDF
Large-Scale Co-Phylogenetic Analysis on the Grid
About the Contributors