BOOKS
BOOK SERIES
JOURNALS
PROCEEDINGS
TEACHING CASES
PAY-PER-VIEW
REFERENCE
E-RESOURCES
ABOUT IGI
BECOME AN AUTHOR/EDITOR  |   MAILING LIST  |   HOW TO ORDER  |   LIBRARY SUGGESTION | EXAMINATION REQUESTS/COURSE ADOPTION | DISTRIBUTORS
IGI Online Bookstore
Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare (2 Volumes)
Edited By: Mario Cannataro, University Magna Graecia of Catanzaro, Italy
Table of Contents:
Section I: Infrastructures and Services for HealthGrids and BioGrids
This section introduces the main concepts and defines technical, methodological and organizational challenges of HealthGrids and BioGrids. After discussing the roadmap toward future HealthGrids, the Section defines basic resources and their discovery in HealthGrids. Then, taking into account the large use of scientific workflows in BioGrids and HealthGrids, the Section introduces the data provenance concept and discusses ways to query and exploit provenance data. Data protection and security, an important aspect that must be faced when considering biomedical data, is also discussed. Finally, a review of distributed data mining and knowledge discovery systems useful for implementing the analysis layer of HealthGrids is also presented.

Chapter I: SHARE: A European Healthgrid Roadmap


    Mark Olive, University of the West of England, UK
    Hanene Boussi Rahmouni, University of the West of England, UK
    Tony Solomonides, University of the West of England, UK
    Vincent Breton, IN2P3, CNRS, Clermont-Ferrand, France
    Nicolas Jacq, HealthGrid (International)
    Yannick Legré, HealthGrid (International), IN2P3, CNRS, Clermont-Ferrand, France
    Ignacio Blanquer, Universidad Politécnica de Valencia, Spain
    Vicente Hernandez, Universidad Politécnica de Valencia, Spain
    Isabelle Andoulsi, Universitaires Notre-Dame de la Paix, Belgium
    Jean Herveg, Universitaires Notre-Dame de la Paix, Belgium
    Celine Van Doosselaere, European Health Management Association (International)
    Petra Wilson, European Health Management Association (International)
    Alexander Dobrev, Empirica GmbH, Germany
    Karl Stroetmann, Empirica GmbH, Germany
    Veli Stroetmann, Empirica GmbH, Germany
Grid technology, one of the key technologies for the “European Research Area”, offers rapid computation, large scale data storage and flexible collaboration by harnessing together the power of large numbers of computers, from end-users’ desktops to powerful workstations and clusters of more powerful machines. However, a major challenge is to take the technology out of the laboratory to the citizen. The application of Grid technology to biomedical and healthcare informatics, in short HealthGrid, presents some difficult challenges. The chapter presents the results of the SHARE project (http://www.eu-share.org) that identified the key developments, i.e. technical advances, social actions, economic investments and ethical or legal initiatives, needed to achieve wide adoption and deployment of HealthGrids throughout Europe. The project analyses several case studies and discusses technical, ethical, legal, social and economic issues which may impede early deployment of HealthGrids.

Chapter II: Types of Resources and their Discovery in HealthGrids


    Aisha Naseer, Brunel University, UK
    Lampros K. Stergioulas, Brunel University, UK
The emerging technology of HealthGrids holds the promise to successfully integrate health information systems and various healthcare entities onto a common, globally shared and easily accessible platform. This chapter presents a taxonomy of different types of HealthGrid resources and proposes some solutions for the resource discovery problem, an emerging challenge in HealthGrids. A discussion on discovering and integrating data resources is also provided.

Chapter III: Data Provenance in Scientific Workflows


    Khalid Belhajjame, University of Manchester, UK
    Paolo Missier, University of Manchester, UK
    Carole A. Goble, University of Manchester, UK
Data provenance is key for understanding and interpreting the results of scientific experiments. This chapter introduces and characterises data provenance in scientific workflows using illustrative examples taken from real-world workflows. Scientific workflows and related provenance data are first defined. Then the chapter proposes a taxonomy that characterizes provenance in scientific workflows. Such taxonomy is used for comparing and analysing provenance capabilities supplied by existing scientific workflow systems.

Chapter IV: Provenance Tracking and End-User Oriented Query Construction


    Bartosz Balis, Institute of Computer Science AGH, Poland
    Marian Bubak, Institute of Computer Science AGH, Poland & University of Amsterdam, The Netherlands
    Michal Pelczar, ACC CYFRONET AGH, Poland
    Jakub Wach, ACC CYFRONET AGH, Poland
Provenance tracking is an indispensable element of each e-Science infrastructure for conducting in silico experiments. The chapter proposes an ontology-based provenance model which captures the execution of in silico experiments, as well as domain-specific semantics of data and computations used in those experiments. Ontologies are used as inter-lingua for end-users, provenance tracking system, and query tools. Moreover, the chapter presents some Query Translation Tools (QUaTRO), enabling end-user oriented, ontology-guided visual querying over provenance records and experiment data. Provenance tracking approach is demonstrated on a Drug Resistance application.

Chapter V: Data Protection and Data Security Regarding Grid Computing in Biomedical Research


    Yassene Mohammed, Georg-August-University, Germany
    Fred Viezens, Georg-August-University, Germany
    Frank Dickmann, Georg-August-University, Germany
    Jürgen Falkner, Fraunhofer Institute for Industrial Engineering IAO, Germany
    Thomas Lingner, Georg-August-University, Germany
    Dagmar Krefting, University Medicine Berlin, Germany
    Ulrich Sax, Georg-August-University, Germany
Grid Computing is of rising interest for Life Sciences, but medical applications on the grid require a special focus on data security and data protection issues. This chapter describes security and privacy issues within the scope of biomedical Grid Computing. Starting from general security and privacy rules, the chapter first describes the current state of the art of grid security, and then it discusses which additional security measures have to be established in different biomedical grid scenarios. Legal aspects as well as the current possibilities and flaws of grid security technology are also described. As a case study, the chapter describes the enhanced security concept offered by MediGRID, a Grid specialized for the Life Sciences, and outlines how medical Grid Computing could fulfill privacy regulations used in more demanding environments.

Chapter VI: Parallel, Distributed, and Grid-Based Data Mining: Algorithms, Systems, and Applications


    Moez Ben HajHmida, Faculty of Sciences of Tunis, Tunisia
    Antonio Congiusta, University of Calabria, Italy & University of Salerno, Italy
Knowledge discovery is an important task in Life Sciences. Classic data mining techniques, developed for centralized sites, often reveal themselves inadequate, due to some unique characteristics of today’s data sources. The development of HealthGrids as well as the use of high performance computers in bioinformatics and Life Sciences is boosting the use of distributed data mining solutions. This chapter presents the state of the art of main distributed data mining techniques and systems. A detailed taxonomy is drawn by analyzing and comparing parallel, distributed and Grid-based data mining methods, with a particular focus on the exploitation of large and remotely dispersed datasets and/or high-performance computers.

Section II: Grids for Genomics and Proteomics
This section discusses the use of the Grid for the management and analysis of genomics and proteomics data, the basic data at the biological level. After describing main issues for the parallel and distributed implementation of BLAST, a cornerstone of all genomics analysis, the Section discusses some significant Grid-based implementations of genomics applications. Then, the Section introduces proteomics with a special focus on mass spectrometry-based proteomics and different Grid-based proteomics applications, ranging from biomarker discovery to protein identification and protein classification, are discussed.

Chapter VII: High Performance BLAST Over the Grid


    Vincent Breton, IN2P3, CNRS, Clermont-Ferrand, France
    Eddy Caron, Université de Lyon, LIP, CNRS-ENS-Lyon-UCBL-INRIA, France
    Frédéric Desprez, INRIA, Université de Lyon, LIP, CNRS-ENS-Lyon-UCBL-INRIA, France
    Gaël Le Mahec, CNRS, IN2P3, UBP Clermont-Ferrand, France
As grids become more and more attractive for solving complex problems with high computational and storage requirements, bioinformatics algorithms start to be ported on large scale platforms. The BLAST kernel was one the first application ported on such platform. However, although simple parallelization was enough for the first proof of concept, its use in production platform needed more optimized algorithms. The chapter reviews existing parallelization and “gridification” of the BLAST algorithm as well as related issues such as data management and replication. A case study using the DIET middleware over the Grid’5000 experimental platform is also presented.

Chapter VIII: Functional Genomics Applications in GRID


    Luciano Milanesi, Istituto di Tecnologie Biomediche – Consiglio Nazionale delle Ricerche, Italy
    Ivan Merelli, Istituto di Tecnologie Biomediche – Consiglio Nazionale delle Ricerche, Italy
    Gabriele Trombetti, Istituto di Tecnologie Biomediche – Consiglio Nazionale delle Ricerche, Italy
    Paolo Cozzi, Istituto di Tecnologie Biomediche – Consiglio Nazionale delle Ricerche, Italy
    Alessandro Orro, Istituto di Tecnologie Biomediche – Consiglio Nazionale delle Ricerche, Italy
A common ongoing task for Functional Genomics is to compare full organisms’ genome with those of related species, to search in huge database for functional annotation of novel sequences and to identify specific patterns of them, such as ESTs, genes, and microRNA. The prediction of these patterns has a relevant computational cost, while public genome archives exceed one billion sequence traces from over 1,000 organisms and this number is increasing rapidly. Thus Functional Genomics applications require significant computational infrastructures, where reusable tools and resources can be accessed. The chapter describes issues and challenges in porting main Functional Genomics applications, from gene prediction, to sequence alignment, and phylogenetic applications on the Grid. The implementation and evaluation of a software environment for the management of distributed bioinformatics computations over the Grid is also presented.

Chapter IX: PheGee@Home: A Grid-Based Tool for Comparative Genomics


    Bertil Schmidt, Nanyang Technological University, Singapore
    Chen Chen, Nanyang Technological University, Singapore
    Weiguo Liu, Nanyang Technological University, Singapore
    Wayne P. Mitchell, Experimental Therapeutics Centre (ETC), Singapore
This chapter describes PheGee@Home, a grid-based comparative genomics tool that finds candidate genes responsible for a given phenotype, where a phenotype is intended as the physical manifestation of the interplay of genetic, epigenetic and environmental factors. The tool facilitates the discovery and prioritization of candidate genes controlling or contributing to the genetically determined portion of a specified phenotype. Due to prohibitively long runtimes the system architecture is based on a desktop grid environment and commodity graphics hardware to significantly accelerate PheGee. Authors validate this approach by showing the comparison of microbial genomes on a grid testbed.

Chapter X: High-Throughput GRID Computing for Life Sciences


    Giulia De Sario, Istituto di Tecnologie Biomediche, CNR, Italy
    Angelica Tulipano, Istituto di Tecnologie Biomediche, CNR, Italy
    Giacinto Donvito, INFN, Sezione di Bari, Italy
    Giorgio Maggi, INFN Bari, Italy & Università e Politecnico di Bari, Italy
    Andreas Gisel, Istituto di Tecnologie Biomediche, CNR, Italy
The number of fully sequenced genomes increases daily, producing an exponential explosion of the sequence, annotation and metadata databases. Data analysis on a genome-wide level has become a data- and computation-intensive task. However, most genomics applications can be partitioned into many independent tasks that can be scheduled on high performance computers or Grids. The problem addressed by the chapter is the Grid-based analysis of Gene Ontology data and its associations to gene products of any kind of organism to find gene products with similar functionalities. The chapter presents a system to partition the computation of the full search into a large number of jobs and to submit these jobs to the Grid infrastructure as long as all jobs are processed successfully, guaranteeing an analysis of the data without missing any information.

Chapter XI: Management and Analysis of Mass Spectrometry Proteomics Data on the Grid


    Mario Cannataro, University Magna Graecia of Catanzaro, Italy
    Pietro Hiram Guzzi, University Magna Graecia of Catanzaro, Italy
    Giuseppe Tradigo, University Magna Graecia of Catanzaro, Italy
    Pierangelo Veltri, University Magna Graecia of Catanzaro, Italy
Recent advances in high throughput technologies such as mass spectrometry enabled the researchers to collect a huge amount of data when analysing biological samples. Computational Proteomics regards the computational methods for analyzing spectra data in qualitative (i.e. peptide/protein identification in tandem mass spectrometry), and quantitative proteomics (i.e. protein expression measurement in samples), as well as in biomarker discovery (i.e. the identification of a molecular signature of a disease directly from spectra). This chapter presents main standards, tools, and technologies for building scalable, reusable, and portable applications in this field. The chapter surveys available solutions for computational proteomics and describes MS-Analyzer, a Grid-based software platform for the integrated management and analysis of spectra data. MS-Analyzer provides efficient spectra management through a specialized spectra database, and supports the semantic composition of pre-processing and data mining services to analyze spectra on the Grid.

Chapter XII: High-Throughput Data Analysis of Proteomic Mass Spectra on the SwissBioGrid


    Andreas Quandt, Swiss Institute of Bioinformatics, Switzerland
    Sergio Maffioletti, ETH Zurich (Swiss National Supercomputing Centre), Switzerland
    Cesare Pautasso, University of Lugano, Switzerland
    Heinz Stockinger, Swiss Institute of Bioinformatics, Switzerland
    Frederique Lisacek, Swiss Institute of Bioinformatics, Switzerland
    Peter Kunszt, ETH Zurich (Swiss National Supercomputing Centre), Switzerland
Proteomics is currently one of the most promising fields in bioinformatics as it provides important insights into the function of proteins of organisms. Mass spectrometry is one of the techniques to study the proteome, and several software tools exist for this purpose. The chapter first introduces the protein identification problem, an important application in proteomics, then it presents a meta-computing approach for solving such problem by combining different existing predictors. The chapter presents an extendable software platform called swissPIT that combines different existing tools and exploits Grid infrastructures to speed up the data analysis process for the proteomics pipeline.

Chapter XIII: Data Mining in Proteomics Using Grid Computing


    Fotis E. Psomopoulos, Aristotle University of Thessaloniki, Greece
    Pericles A. Mitkas, Aristotle University of Thessaloniki, Greece
The goal of this chapter is the presentation of Data Mining techniques for knowledge extraction in proteomics, taking into account both the particular features of most proteomics issues (such as data retrieval and system complexity), and the opportunities and constraints found in a Grid environment. The chapter discusses how new and potentially useful knowledge can be extracted from proteomics data utilizing Grid resources in a transparent way. As a case study, the problem of protein classification is considered. After introducing an overview of Data Mining algorithms with emphasis on the specific needs of protein classification, the chapter presents a unified methodology for complex Data Mining processes on the Grid, highlighting the different application types and the benefits and drawbacks in each case. Finally, the methodology is validated through real-world case studies, deployed over the EGEE grid environment.

Section III: Grid-Based Bioinformatics Environments
This section discusses Grid-based software environments for bioinformatics. Grid-based Problem Solving Environments specifically devoted for bioinformatics, as well as Grid-based implementations of relevant bioinformatics applications, such as docking, bio-molecular simulation and molecule structure determinations, are presented.

Chapter XIV: ProGenGrid: A Grid Problem Solving for Bioinformatics


    Maria Mirto, University of Salento, Lecce, Italy & SPACI Consortium, Italy
    Italo Epicoco, University of Salento, Lecce, Italy & SPACI Consortium, Italy
    Massimo Cafaro, University of Salento, Lecce, Italy & SPACI Consortium, Italy
    Sandro Fiore, University of Salento, Lecce, Italy & SPACI Consortium, Italy,
    Marco Passante, University of Salento, Lecce, Italy & SPACI Consortium, Italy
    Alessandro Negro, University of Salento, Lecce, Italy & SPACI Consortium, Italy
    Giovanni Aloisio, University of Salento, Lecce, Italy & SPACI Consortium, Italy
This chapter describes ProGenGrid (Proteomics and Genomics Grid), a Grid Problem Solving Environment specialized for the Bioinformatics domain. The presented system provides an integrated environment for composing, scheduling and monitoring biological applications in a Grid. The main feature offered by this environment is the possibility to use an easy-to-use web interface for composing workflow jobs, scheduled on different grid middlewares.

Chapter XV: A Graphical Workflow Modeler for Docking Process in Drug Discovery


    Qiang Wang, Harbin Institute of Technology, China
    Yunming Ye, Harbin Institute of Technology, China
    Kunqian Yu, Chinese Academy of Science (CAS), China
    Joshua Zhexue Huang, University of Hong Kong, China
A drug discovery process is aimed to find from a large set of molecules the candidate leads that have strong interaction with the target proteins. The process of drug discovery is characterized by its complexity in data and computation. Thus, the combination of complex algorithms and data management functions are necessary for domain scientists to build proper drug discovery procedures. This chapter presents a graphical workflow modeler for domain scientists to perform drug discovery tasks on high performance grid computing platforms. A client/server system is described as the platform for implementation of the graphical workflow modeler. A case study on drug discovery for avian influenza virus is presented to demonstrate the use of this tool in drug discovery research.

Chapter XVI: BioSimGrid Biomolecular Simulation Database


    Kaihsu Tai, University of Oxford, UK
    Mark S. P. Sansom, University of Oxford, UK
The chapter presents BioSimGrid, a distributed biomolecular simulation database. It is a general-purpose database for trajectories from molecular dynamics simulations. The presentation of BioSimGrid explains how to install the system, and how to deposit, query, and analyze trajectories with real Python code examples for each step. The chapter also presents the underlying concepts in the implementation of BioSimGrid: relational database, distributed computing, and the input/output (deposit and analysis) modules. The chapter concludes by discussing emerging trends in biomolecular simulations and concerns in the further development of BioSimGrid and similar biological databases.

Chapter XVII: Molecular Structure Determination on the Grid


    Russ Miller, Hauptman-Woodward Medical Research Institute, USA & SUNY-Buffalo, USA
    Charles M. Weeks, Hauptman-Woodward Medical Research Institute, USA
The New York State Grid (NYS Grid) is an integrated computational and data grid that provides access to users from around the world to a wide variety of resources. This Grid can be accessed via a Web portal, where the users have access to their data sets and applications, but do not need to be made aware of the details of the data storage or computational devices that are specifically employed in solving their problems. The chapter presents a Grid-enabled version of the SnB and BnP programs which respectively implement the Shake-and-Bake method of molecular structure (SnB) and substructure (BnP) determination. The programs were run on the NYS Grid and their performance evaluation is presented. In particular, SnB has been run simultaneously through the Grid Portal on all the computational resources of the NYS Grid as well as on more than 1100 of the over 3000 processors available through the Open Science Grid.

Section IV: Grids for Medical Informatics
This Section focuses on Grid-based medical informatics applications. Since a core aspect of medical applications regards the acquisition and analysis of biomedical images, the Section first introduces different aspects of biomedical images visualization on the Grid. Then, as significant applications in the field, Grid analysis of radiological data and 3D electron microscopy reconstruction are discussed. Finally, as an example of biomedical instrumentation made available on the Grid, a hybrid mock circulatory system to test cardiovascular prostheses is presented.

Chapter XVIII: Aspects of Visualization and the Grid in a Biomedical Context


    Ian Greenshields, University of Connecticut, USA
    Gamal El-Sayed, University of Connecticut, USA
Visualization, the art and science of representing data visually, is now recognized as an equal partner in the conduct of science via the simulation and modeling paradigm. Different aspects of visualization, such as image preprocessing/analysis, and some computational geometric and computational topological processes are amenable to deployment over compute Grids, but there has been equal focus on the collaborative aspect of Grid computing which is driving collaboration-based visualization systems. The chapter introduces some aspects of visualization and the grid. The chapter first surveys some of the roles of visualization as they relate to the role of Grid computing within a biomedical context. Then it examines certain scheduling strategies that are believed to have value in terms of the distribution of visualization tasks over Grid fabrics. Volume II

Chapter XIX: Grid Analysis of Radiological Data


    Cécile Germain-Renaud, Université Paris-Sud, CNRS, France
    Vincent Breton, IN2P3, CNRS, Clermont-Ferrand, France
    Patrick Clarysse, CNRS; Inserm; INSA-Lyon; Université Lyon 1, France
    Bertrand Delhay, CNRS; Inserm; INSA-Lyon; Université Lyon 1, France
    Yann Gaudeau, Université Strasbourg; CNRS, France
    Tristan Glatard, Université de Lyon; CREATIS-LRMN, France
    Emmanuel Jeannot, INRIA; CNRS; Université Henri Poincaré, France
    Yannick Legré, CNRS; Université Blaise Pascal, France
    Johan Montagnat, CNRS; Université Nice Sophia-Antipolis, France
    Jean Marie Moureaux, CNRS; Université Henri Poincaré, France
    Angel Osorio, CNRS, France
    Xavier Pennec, INRIA Sophia-Antipolis, France
    Joël Schaerer, CNRS; Inserm; INSA-Lyon; Université Lyon 1, France
    Romain Texier, CNRS; Université Nice Sophia-Antipolis, France
Grid technologies and infrastructures can contribute to harnessing the full power of computer-aided image analysis into clinical research and practice. A main challenge in medical images analysis is to fill the gap between the grid middleware and the requirements of clinical applications, that can be summarized in large volumes of data, sensitivity of medical information, and complexity of medical datasets. This chapter reports on the goals, achievements and lessons learned from the AGIR (Grid Analysis of Radiological Data) project. AGIR faces these challenges by providing some core grid medical services (data management, responsiveness, compression, and workflows) supporting medical data processing, and by grid-enabling a panel of applications ranging from algorithmic research to clinical use cases.

Chapter XX: Grid Computing in 3D Electron Microscopy Reconstruction


    J.R. Bilbao Castro, University of Almería. Spain
    I. García Fernández, University of Almería. Spain
    J.J. Fernández, University of Almería. Spain
Three-dimensional electron microscopy allows scientists to study biological specimens and to understand how they behave and interact with each other depending on their structural conformation. Electron microscopy projections of the specimens are taken from different angles and are processed to obtain a virtual three-dimensional reconstruction for further studies. Nevertheless, the whole reconstruction process is neither straightforward nor cheap in terms of computational costs, thus different computing paradigms have been applied in order to overcome such high costs. The chapter explores the main tasks present in a typical three-dimensional electron microscopy reconstruction process showing parallelization and gridification issues. In addition, important aspects like fault-tolerance are widely covered; given that the distributed nature of a grid infrastructure makes it inherently unstable and difficult to predict. Issues on Automatic Grid Job Management and Automatic Files Replication on the EGEE Grid are also described.

Chapter XXI: Hybrid Mock Circulatory System to Test Cardiovascular Prostheses on the Grid


    Francesco Maria Colacino, University “Magna Græcia” of Catanzaro, Italy & University of Calabria, Italy
    Maurizio Arabia, University of Calabria, Italy
    Gionata Fragomeni, University “Magna Græcia” of Catanzaro, Italy
In the last decades cardiovascular diseases greatly increased worldwide and bioengineering provided new technologies and cardiovascular prostheses to medical doctors and surgeons. The design of active and passive devices aroused notable interests becoming more and more challenging as well as crucial. In this framework, it is important to faithfully reproduce the interaction between the prostheses and the cardiovascular system when in-vitro experiments are performed. For this reason, new and improved kind of test benches become necessary. Purely hydraulic mock circulatory systems showed low flexibility to allow tests of different cardiovascular devices and low precision when a reference mathematical model must be reproduced. In this chapter a new bench is described. It combines the computer model of the cardiovascular system and its real-time interaction with the device to be tested. A possible architecture to deploy the solution adopted in a Grid environment to allow remote experimentation is presented.

Section V: Collaborative Grids for Healthcare and Clinical Applications
Healthcare applications at the population level as well as applications used in the clinical context are the focus of this Section. These applications are the most difficult to port on the Grid, due to the high number of patients involved, the partitioning of data among different health centers, the privacy and security problems and the need of collaboration among various operators. The use of the Grid as a support for collaboration in epidemiology, as well as the integration and analysis of epidemiology data are fully described. Two important HealthGrids for paediatrics and brain injury are presented showing how the Grid is a mature technology also for the clinical setting.

Chapter XXII: Grid Technologies in Epidemiology


    Ignacio Blanquer, Universidad Politécnica de Valencia, Spain
    Vicente Hernández, Universidad Politécnica de Valencia, Spain
Epidemiology constitutes one relevant use case for the adoption of grids for health. It combines challenges that have been traditionally addressed by grid technologies, such as managing large amounts of distributed and heterogeneous data, large scale computing and the need for integration and collaboration tools, but introduces new challenges traditionally addressed from the e-health area. Although grid technology has been applied to epidemiology, e.g. for data federation and for evaluating statistical epidemiological models, however, epidemiology presents important constraints that are not solved. The chapter presents the most important problems of epidemiology, such as the semantic integration of data, the effective management of security and privacy, the lack of exploitation models for the use of infrastructures, the instability of Quality of Service and the seamless integration of the technology on the epidemiology environment. Then it presents an analysis of how these issues are being considered in state-of-the-art research.

Chapter XXIII: IntegraEPI: Epidemiologic Surveillance on the Grid


    Fabrício Alves Barbosa da Silva, Universidade de Lisboa, Portugal
    Henrique Fabrício Gagliardi, Instituto de Ensino Superior COC, Brasil
    Eduardo Gallo, APRAESPI, Brasil
    Maria Antónia Madope, Ford Foundation Alumni Association, Moçambique
    Virgílio Cavicchioli Neto, Universidade Federal de São Paulo, Brasil
    Ivan Torres Pisa, Universidade Federal de São Paulo, Brasil
    Domingos Alves, Universidade de São Paulo, Brasil
The chapter presents IntegraEPI, a Grid-based system for space-time visualization, monitoring, modeling and analysis of epidemic data. The system integrates data from heterogeneous epidemic databases and provides analytical and computational methods to increase the predicting capability of the public health system when dealing with epidemic outbreak and prevention. By using IntegraEPI, Health authorities will be able to decide about a set of possible actions that will be previously tested in a virtual population interacting in an urban infrastructure, considering its environmental factors, and finally compare the simulated data to consolidated data of real epidemic dynamics.

Chapter XXIV: Gridifying Biomedical Applications in the Health-e-Child Project


    David Manset, maat Gknowledge, France
    Frederic Pourraz, maat Gknowledge, France
    Alexey Tsymbal, Siemens AG, Germany
    Jerome Revillard, maat Gknowledge, France
    Konstantin Skaburskas, CERN, Switzerland
    Richard McClatchey, University of the West of England, UK
    Ashiq Anjum, University of the West of England, UK
    Alfonso Rios, maat Gknowledge, Spain
    Martin Huber, Siemens AG, Germany
The Health-e-Child project is developing a Grid-based healthcare platform for European paediatrics and providing seamless integration of traditional and emerging sources of biomedical information. It aims to provide data integration across heterogeneous biomedical information in order to facilitate improved clinical practice, scientific research and personalized healthcare. The goal of this chapter is to share experiences, and present major issues faced, solutions found and a roadmap for future work in developing the Grid infrastructure for interactive biomedical applications. After describing the Grid architecture used in the project, the chapter illustrates a concrete example of one integrated key application, the Health-e-Child CaseReasoner, which is intended for biomedical decision support over the Grid, and is based on similarity search and advanced data visualization techniques.

Chapter XXV: e-Infrastructures Fostering Multi-Centre Collaborative Research into the Intensive Care Management of Patients with Brain Injury


    Richard O. Sinnott, University of Glasgow, UK
    Ian Piper, Southern General Hospital, Glasgow, UK
Clinical research is becoming ever more collaborative with multi-centre trials. An important research field where collaboration among centers and secure access to data is particularly important is the brain injury domain, due to the complicated multi-trauma nature of the disease with its related collation of time-series data. Although many IT-based multi-centre e-Infrastructures such as the Brain Monitoring with Information Technology group and the Cooperative Study on Brain Injury Depolarisations have been formed, a serious impediment to the effective implementation of these networks is access to the know-how and experience needed to install, deploy and manage security-oriented middleware systems that provide secure access to distributed hospital-based datasets and especially the linkage of these data sets across sites. This chapter describes the problems inherent to data collection within the brain injury medical domain, the current IT-based solutions designed to address these problems and how they perform in practice. The chapter describes a Grid-based prototype solution which ultimately formed the basis for the AVERT-IT project. The design of the underlying Grid infrastructure for AVERT-IT and how it will be used to produce novel approaches to data collection, data validation and clinical trial design is also presented.

Section VI: Grid-Based Virtual Laboratories for Bioinformatics and e-Science
This Section discusses concepts and properties of Virtual Laboratories, an abstraction for cooperative data analysis and distributed collaboration among scientists, and their applications in Life Sciences. After describing the foundations of modern virtual laboratories, such as formalisms for representing domain knowledge, data integration, semantic annotations and shared vocabularies, the Section describes some emergent virtual laboratories that focus on distributed collaboration and use of provenance data, transparent use of the Grid and support for specific domains like bioinformatics.

Chapter XXVI: Semantic Integration for Research Environments


    Tomasz Gubala, University of Amsterdam, The Netherlands & ACC CYFRONET AGH, Poland
    Marian Bubak, Institute of Computer Science AGH, Poland & University of Amsterdam, The Netherlands
    Peter M.A. Sloot, Informatics Institute, University of Amsterdam, The Netherlands
With varying levels of expertise and roles, along with multitudes of data sources and processing units. The high level of required integration contrasts with the loosely-coupled nature of environments which are appropriate for research. A main problem is to support integration of dynamic service-based infrastructures with data sources, tools and users in a way that conserves ubiquity, extensibility and usability. This chapter presents the basic concepts of semantics-based collaborative environments, including semantic data, semantic metadata, annotations, and services integration. Then the authors demonstrate that using semantics as an integration mechanism, i.e. combining formal representations of domain knowledge with techniques like data integration, semantic annotations and shared vocabularies, enables the development of systems for modern e-Science (collaborative laboratories named “collaboratories”). As case study, the way how several semantically-augmented experiments are modeled in the ViroLab virtual laboratory for virology is presented.

Chapter XXVII: Virtual Laboratory for Collaborative Applications


    Marian Bubak, Institute of Computer Science AGH, Poland & University of Amsterdam, The Netherlands
    Maciej Malawski, Institute of Computer Science AGH, Poland
    Tomasz Gubala, ACC CYFRONET AGH, Poland & University of Amsterdam, The Netherlands
    Marek Kasztelnik, ACC CYFRONET AGH, Poland
    Piotr Nowakowski, ACC CYFRONET AGH, Poland
    Daniel Harezlak, ACC CYFRONET AGH, Poland
    Tomasz Bartynski, University of Amsterdam, The Netherlands & ACC CYFRONET AGH, Poland
    Joanna Kocot, ACC CYFRONET AGH, Poland
    Eryk Ciepiela, ACC CYFRONET AGH, Poland
    Wlodzimierz Funika, Institute of Computer Science AGH, Poland
    Dariusz Krol, ACC CYFRONET AGH, Poland
    Bartosz Balis, Institute of Computer Science AGH, Poland
    Matthias Assel, University of Stuttgart, Germany
    Alfredo Tirado-Ramos, University of Amsterdam, The Netherlands
Advanced research in Life Sciences requires new information technology solutions to support complex computer simulations, collaborative result analysis and annotation, as well as software reuse. This chapter presents the ViroLab virtual laboratory, which is an integrated system of dedicated tools and services, providing a common space for planning, building, improving and performing in-silico experiments by different groups of users. Within the virtual laboratory, collaborative applications are built as experiment plans, using a notation based on the Ruby scripting language. During experiment execution, provenance data is created and stored. The virtual laboratory enables access to distributed, computational and data resources, in Grid systems, clusters and standalone computers. The process of application development as well as the architecture and functionality of the virtual laboratory are demonstrated using a real-life example from the HIV treatment domain.

Chapter XXVIII: Leveraging the Power of the Grid with Opal


    Sriram Krishnan, University of California San Diego, USA
    Luca Clementi, University of California San Diego, USA
    Zhaohui Ding, Jilin University, China
    Wilfred Li, University of California San Diego, USA
Grid systems provide mechanisms for single sign-on, and uniform APIs for job submission and data transfer, in order to allow the coupling of distributed resources in a seamless manner. However, new users face a daunting barrier of entry due to the high cost of deployment and maintenance. They are often required to learn complex concepts relative to Grid infrastructures (credential management, scheduling systems, data staging, etc). To most scientific users, running their applications with minimal changes and yet getting results faster is highly desirable, without having to know much about how the resources are used. Hence, a higher level of abstraction must be provided for the underlying infrastructure to be used effectively. The chapter describes the Opal toolkit, a framework for exposing applications on Grid resources as simple Web services. Opal provides a basic set of Application Programming Interfaces (APIs) that allows users to execute their deployed applications, query job status, and retrieve results. Opal also provides a mechanism to define command-line arguments and automatically generates user interfaces for the Web services dynamically. In addition, Opal services can be hooked up to a Metascheduler such as CSF4 to leverage a distributed set of resources, and accessed via a multitude of interfaces such as Web browsers, rich desktop environments, workflow tools, and command-line clients.

Chapter XXIX: The LIBI Grid Platform for Bioinformatics


    The LIBI Grid Platform Developers, Italy
The LIBI project (International Laboratory of BioInformatics) aims to develop an advanced bioinformatics and computational biology laboratory, focusing on basic and applied research in modern biology and biotechnologies. The paper presents the core part of the system, a Grid Problem Solving Environment, built on top of EGEE, DEISA and SPACI infrastructures, allowing the submission and monitoring of jobs mapped to complex experiments in bioinformatics. Several case studies on different bioinformatics applications and related results which have been obtained using the LIBI platform are also reported.

Section VII: Building and Deploying HealthGrids
This Section describes main infrastructures, middleware, and tools for building and deploying HealthGrids. After introducing what are the main requirements on Grid middleware posed by biomedical applications and how these are satisfied by a well know Grid middleware, the Section describes a Grid-based portal that aims to promote collaboration and cooperation among scientists and healthcare research groups on the Grid. The management and deployment of Grid repositories for biomedical digital contents such as mammograms is also discussed.

Chapter XXX: UNICORE: A Middleware for Life Sciences Grids


    Piotr Bala, ICM University of Warsaw & N. Copernicus University, Poland
    Kim Baldridge, University of Zurich, Switzerland
    Emilio Benfenati, Istituto Mario Negri, Italy
    Mosè Casalegno, Istituto Mario Negri, Italy
    Uko Maran, University of Tartu, Estonia
    Lukasz Miroslaw, University of Zürich, Switzerland
    Vitaliy Ostropytskyy, University of Ulster, UK
    Katharina Rasch, Technische Universität Dresden, Germany
    Sulev Sild, University of Tartu, Estonia
    Robert Schöne, Technische Universität Dresden, Germany
    Bernd Schuller, Research Centre Juelich, Germany
    Nadya Williams, University of Zürich, Switzerland
This chapter provides an overview of Grid middleware and applications related to biomedical and Life Sciences disciplines. Various technologies, including web-based solutions, are presented. One of the solutions, the UNICORE framework, in its recent version implements key grid standards and specifications. The system architecture and capabilities, such as security, workflow and data management are described. Special attention is given to the idea of a ‘gridbean’, which expands the UNICORE use for different applications. Examples of gridbeans are provided and the capabilities of UNICORE are illustrated through specific examples built using this grid middleware. In particular, the Chemomentum workbench and its use for in-silico design and modeling in chemistry and Life Sciences are both described.

Chapter XXXI: A Grid Paradigm for e-Science Applications


    Livia Torterolo, University of Genoa, Italy
    Luca Corradi, University of Genoa, Italy
    Barbara Canesi, University of Genoa, Italy
    Marco Fato, University of Genoa, Italy
    Roberto Barbera, University of Catania, Italy & INFN-Catania, Italy
    Salvatore Scifo, Consorzio Cometa of Catania, Italy
    Antonio Calanducci, INFN-Catania, Italy
    Diego Scardaci, INFN (National Institute of Nuclear Physics), Italy
    Giordano Scuderi, Unico Informatica s.r.l, Catania, Italy
Nowadays many biomedicine studies are dealing with large, distributed, and heterogeneous repositories as well as with computationally demanding analyses, and complex integration techniques are more often required to handle this complexity. This chapter describes the Bio Med Portal, a Grid oriented platform that aims to promote collaboration and cooperation among scientists and healthcare research groups, enabling the remote use of resources integrated in complex software platform services forming a virtual laboratory. The Bio Med Portal is designed to host several medical services and it is able to deploy several analysis algorithms. The scope of this chapter is both to present a Grid application with its own medical use case and to emphasize the benefit that a new design paradigm based on Grid could provide to research groups spread in geographically distributed sites.

Chapter XXXII: gLibrary/DRI: A Grid-Based Platform to Host Multiple Repositories for Digital Content


    Roberto Barbera, University of Catania, Italy & INFN-Catania, Italy
    Antonio Calanducci, INFN-Catania, Italy
    Juan Manuel González Martín, maat Gknowledge, Spain
    Francisco Prieto Castrillo, CETA-CIEMAT, Spain
    Raúl Ramos Pollán, CETA-CIEMAT, Spain
    Manuel Rubio del Solar, CETA-CIEMAT, Spain
    Dorin Tcaci, maat Gknowledge, Spain
Repositories are digital stores that manage data and metadata providing their access to users, offering an easy-to-use service and a powerful system to handle digital assets. This chapter presents the gLibrary/ DRI (Digital Repositories Infrastructure) platform, a Grid-based digital repository that takes advantage of the Grid features such as VO authentication, file catalogues, and metadata services. The main goal of the platform is to reduce the cost in terms of time and effort that a repository provider spends to get its repository deployed. This is achieved by providing a common infrastructure and a set of mechanisms that repository providers use to define the data model, the access to the content and the storage model. Two use cases are also presented: a mammograms repository example that provides clinicians with a tool that eases diagnostics process and an algorithmic repository based on the Poincare Surface Section.

Section VIII: Selected Readings
This Section is a short collection of suggested readings of different authors, aiming to enrich this book with others knowledge, experience, thought and insight. After introducing Cloud Computing and describing the porting of applications to Grids and Clouds, the Section describes a bio-inspired approach for the construction of a self-organizing Grid information system, and concludes discussing Grid-based implementation of phylogenetic analysis.

Chapter XXXIII: Porting Applications to Grids and Clouds


    Wolfgang Gentzsch, Duke University, USA
Cloud Computing is an emerging style of computing in which dynamically scalable resources are provided as a service. The chapter describes main stages of implementing applications on Grid and Cloud infrastructures. As a case study, the chapter presents the Distributed European Infrastructure for Supercomputing Applications (DEISA) and describes the DEISA Extreme Computing Initiative (DECI) for porting and running scientific grand challenge applications on the DEISA Grid. The chapter concludes proposing the top ten rules of building a sustainable Grid.

Chapter XXXIV: Evaluating a Bio-Inspired Approach for the Design of a Grid Information System: The SO-Grid Portal


    Agostino Forestiero, Institute of High Performance Computing and Networking CNR-ICAR, Italy
    Carlo Mastroianni, Institute of High Performance Computing and Networking CNR-ICAR, Italy
    Fausto Pupo, Institute of High Performance Computing and Networking CNR-ICAR, Italy
    Giandomenico Spezzano, Institute of High Performance Computing and Networking CNR-ICAR, Italy
The chapter describes a bio-inspired approach for the construction of a self-organizing Grid information system, with the aim to foster the use of swarm intelligence, multi-agent and bio-inspired paradigms in the field of distributed computing. The chapter also describes the SO-Grid Portal, a simulation portal through which registered users can simulate and analyze the ant-based protocols and perform “parameter sweep” studies.

Chapter XXXV: Large-Scale Co-Phylogenetic Analysis on the Grid


    Heinz Stockinger, Swiss Institute of Bioinformatics, Switzerland
    Alexander F. Auch, University of Tübingen, Germany
    Markus Göker, University of Tübingen, Germany
    Jan Meier-Kolthoff, University of Tübingen, Germany
    Alexandros Stamatakis, Ludwig-Maximilians-University Munich, Germany
Phylogenetic data analysis represents an extremely compute-intensive area of Bioinformatics and thus requires high-performance technologies. In particular, the chapter focuses on hostparasite co-phylogenetic analysis, another compute- and memory-intensive problem. After introducing some tools for conducting such co-phylogenetic studies, the chapter describes their enhanced Grid-based implementations. Since the computational core of the problem is embarrassingly parallel, the chapter shows how the parallel implementation fits well to a computational Grid and reduces the response time of large scale analyses.