|
|
Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare (2 Volumes)

Table of Contents: Section I: Infrastructures and Services for HealthGrids and BioGrids
This section introduces the main concepts and defines technical, methodological and organizational challenges of HealthGrids and BioGrids. After discussing the roadmap toward future HealthGrids, the Section defines basic resources and their discovery in HealthGrids. Then, taking into account the large use of scientific workflows in BioGrids and HealthGrids, the Section introduces the data provenance concept and discusses ways to query and exploit provenance data. Data protection and security, an important aspect that must be faced when considering biomedical data, is also discussed. Finally, a review of distributed data mining and knowledge discovery systems useful for implementing the analysis layer of HealthGrids is also presented.
Chapter I:
SHARE: A European Healthgrid Roadmap
Mark Olive, University of the West of England, UK
Hanene Boussi Rahmouni, University of the West of England, UK
Tony Solomonides, University of the West of England, UK
Vincent Breton, IN2P3, CNRS, Clermont-Ferrand, France
Nicolas Jacq, HealthGrid (International)
Yannick Legré, HealthGrid (International), IN2P3, CNRS, Clermont-Ferrand, France
Ignacio Blanquer, Universidad Politécnica de Valencia, Spain
Vicente Hernandez, Universidad Politécnica de Valencia, Spain
Isabelle Andoulsi, Universitaires Notre-Dame de la Paix, Belgium
Jean Herveg, Universitaires Notre-Dame de la Paix, Belgium
Celine Van Doosselaere, European Health Management Association (International)
Petra Wilson, European Health Management Association (International)
Alexander Dobrev, Empirica GmbH, Germany
Karl Stroetmann, Empirica GmbH, Germany
Veli Stroetmann, Empirica GmbH, Germany
Grid technology, one of the key technologies for the “European Research Area”, offers rapid computation, large scale data storage and flexible collaboration by harnessing together the power of large numbers of computers, from end-users’ desktops to powerful workstations and clusters of more powerful machines. However, a major challenge is to take the technology out of the laboratory to the citizen. The application of Grid technology to biomedical and healthcare informatics, in short HealthGrid, presents some difficult challenges. The chapter presents the results of the SHARE project (http://www.eu-share.org) that identified the key developments, i.e. technical advances, social actions, economic investments and ethical or legal initiatives, needed to achieve wide adoption and deployment of HealthGrids throughout Europe. The project analyses several case studies and discusses technical, ethical, legal, social and economic issues which may impede early deployment of HealthGrids.
Chapter II:
Types of Resources and their Discovery in HealthGrids
Aisha Naseer, Brunel University, UK
Lampros K. Stergioulas, Brunel University, UK
The emerging technology of HealthGrids holds the promise to successfully integrate health information
systems and various healthcare entities onto a common, globally shared and easily accessible platform.
This chapter presents a taxonomy of different types of HealthGrid resources and proposes some solutions
for the resource discovery problem, an emerging challenge in HealthGrids. A discussion on discovering
and integrating data resources is also provided.
Chapter III:
Data Provenance in Scientific Workflows
Khalid Belhajjame, University of Manchester, UK
Paolo Missier, University of Manchester, UK
Carole A. Goble, University of Manchester, UK
Data provenance is key for understanding and interpreting the results of scientific experiments. This chapter introduces and characterises data provenance in scientific workflows using illustrative examples taken from real-world workflows. Scientific workflows and related provenance data are first defined. Then the chapter proposes a taxonomy that characterizes provenance in scientific workflows. Such taxonomy is used for comparing and analysing provenance capabilities supplied by existing scientific workflow systems.
Chapter IV:
Provenance Tracking and End-User Oriented Query Construction
Bartosz Balis, Institute of Computer Science AGH, Poland
Marian Bubak, Institute of Computer Science AGH, Poland & University of Amsterdam,
The Netherlands
Michal Pelczar, ACC CYFRONET AGH, Poland
Jakub Wach, ACC CYFRONET AGH, Poland
Provenance tracking is an indispensable element of each e-Science infrastructure for conducting in silico experiments. The chapter proposes an ontology-based provenance model which captures the execution of in silico experiments, as well as domain-specific semantics of data and computations used in those experiments. Ontologies are used as inter-lingua for end-users, provenance tracking system, and query tools. Moreover, the chapter presents some Query Translation Tools (QUaTRO), enabling end-user oriented, ontology-guided visual querying over provenance records and experiment data. Provenance tracking approach is demonstrated on a Drug Resistance application.
Chapter V:
Data Protection and Data Security Regarding Grid Computing in Biomedical Research
Yassene Mohammed, Georg-August-University, Germany
Fred Viezens, Georg-August-University, Germany
Frank Dickmann, Georg-August-University, Germany
Jürgen Falkner, Fraunhofer Institute for Industrial Engineering IAO, Germany
Thomas Lingner, Georg-August-University, Germany
Dagmar Krefting, University Medicine Berlin, Germany
Ulrich Sax, Georg-August-University, Germany
Grid Computing is of rising interest for Life Sciences, but medical applications on the grid require a special focus on data security and data protection issues. This chapter describes security and privacy issues within the scope of biomedical Grid Computing. Starting from general security and privacy rules, the chapter first describes the current state of the art of grid security, and then it discusses which additional security measures have to be established in different biomedical grid scenarios. Legal aspects as well as the current possibilities and flaws of grid security technology are also described. As a case study, the chapter describes the enhanced security concept offered by MediGRID, a Grid specialized for the Life Sciences, and outlines how medical Grid Computing could fulfill privacy regulations used in more demanding environments.
Chapter VI:
Parallel, Distributed, and Grid-Based Data Mining: Algorithms, Systems, and Applications
Moez Ben HajHmida, Faculty of Sciences of Tunis, Tunisia
Antonio Congiusta, University of Calabria, Italy & University of Salerno, Italy
Knowledge discovery is an important task in Life Sciences. Classic data mining techniques, developed for centralized sites, often reveal themselves inadequate, due to some unique characteristics of today’s data sources. The development of HealthGrids as well as the use of high performance computers in bioinformatics and Life Sciences is boosting the use of distributed data mining solutions. This chapter presents the state of the art of main distributed data mining techniques and systems. A detailed taxonomy is drawn by analyzing and comparing parallel, distributed and Grid-based data mining methods, with a particular focus on the exploitation of large and remotely dispersed datasets and/or high-performance computers.
Section II:
Grids for Genomics and Proteomics
This section discusses the use of the Grid for the management and analysis of genomics and proteomics data, the basic data at the biological level. After describing main issues for the parallel and distributed implementation of BLAST, a cornerstone of all genomics analysis, the Section discusses some significant Grid-based implementations of genomics applications. Then, the Section introduces proteomics with a special focus on mass spectrometry-based proteomics and different Grid-based proteomics applications, ranging from biomarker discovery to protein identification and protein classification, are discussed.
Chapter VII:
High Performance BLAST Over the Grid
Vincent Breton, IN2P3, CNRS, Clermont-Ferrand, France
Eddy Caron, Université de Lyon, LIP, CNRS-ENS-Lyon-UCBL-INRIA, France
Frédéric Desprez, INRIA, Université de Lyon, LIP, CNRS-ENS-Lyon-UCBL-INRIA, France
Gaël Le Mahec, CNRS, IN2P3, UBP Clermont-Ferrand, France
As grids become more and more attractive for solving complex problems with high computational and storage requirements, bioinformatics algorithms start to be ported on large scale platforms. The BLAST kernel was one the first application ported on such platform. However, although simple parallelization was enough for the first proof of concept, its use in production platform needed more optimized algorithms. The chapter reviews existing parallelization and “gridification” of the BLAST algorithm as well as related issues such as data management and replication. A case study using the DIET middleware over the Grid’5000 experimental platform is also presented.
Chapter VIII:
Functional Genomics Applications in GRID
Luciano Milanesi, Istituto di Tecnologie Biomediche – Consiglio Nazionale delle
Ricerche, Italy
Ivan Merelli, Istituto di Tecnologie Biomediche – Consiglio Nazionale delle
Ricerche, Italy
Gabriele Trombetti, Istituto di Tecnologie Biomediche – Consiglio Nazionale delle
Ricerche, Italy
Paolo Cozzi, Istituto di Tecnologie Biomediche – Consiglio Nazionale delle
Ricerche, Italy
Alessandro Orro, Istituto di Tecnologie Biomediche – Consiglio Nazionale delle
Ricerche, Italy
A common ongoing task for Functional Genomics is to compare full organisms’ genome with those of related species, to search in huge database for functional annotation of novel sequences and to identify specific patterns of them, such as ESTs, genes, and microRNA. The prediction of these patterns has a relevant computational cost, while public genome archives exceed one billion sequence traces from over 1,000 organisms and this number is increasing rapidly. Thus Functional Genomics applications require significant computational infrastructures, where reusable tools and resources can be accessed. The chapter describes issues and challenges in porting main Functional Genomics applications, from gene prediction, to sequence alignment, and phylogenetic applications on the Grid. The implementation and evaluation of a software environment for the management of distributed bioinformatics computations over the Grid is also presented.
Chapter IX:
PheGee@Home: A Grid-Based Tool for Comparative Genomics
Bertil Schmidt, Nanyang Technological University, Singapore
Chen Chen, Nanyang Technological University, Singapore
Weiguo Liu, Nanyang Technological University, Singapore
Wayne P. Mitchell, Experimental Therapeutics Centre (ETC), Singapore
This chapter describes PheGee@Home, a grid-based comparative genomics tool that finds candidate genes responsible for a given phenotype, where a phenotype is intended as the physical manifestation of the interplay of genetic, epigenetic and environmental factors. The tool facilitates the discovery and prioritization of candidate genes controlling or contributing to the genetically determined portion of a specified phenotype. Due to prohibitively long runtimes the system architecture is based on a desktop grid environment and commodity graphics hardware to significantly accelerate PheGee. Authors validate this approach by showing the comparison of microbial genomes on a grid testbed.
Chapter X:
High-Throughput GRID Computing for Life Sciences
Giulia De Sario, Istituto di Tecnologie Biomediche, CNR, Italy
Angelica Tulipano, Istituto di Tecnologie Biomediche, CNR, Italy
Giacinto Donvito, INFN, Sezione di Bari, Italy
Giorgio Maggi, INFN Bari, Italy & Università e Politecnico di Bari, Italy
Andreas Gisel, Istituto di Tecnologie Biomediche, CNR, Italy
The number of fully sequenced genomes increases daily, producing an exponential explosion of the sequence, annotation and metadata databases. Data analysis on a genome-wide level has become a data- and computation-intensive task. However, most genomics applications can be partitioned into many independent tasks that can be scheduled on high performance computers or Grids. The problem addressed by the chapter is the Grid-based analysis of Gene Ontology data and its associations to gene products of any kind of organism to find gene products with similar functionalities. The chapter presents a system to partition the computation of the full search into a large number of jobs and to submit these jobs to the Grid infrastructure as long as all jobs are processed successfully, guaranteeing an analysis of the data without missing any information.
Chapter XI:
Management and Analysis of Mass Spectrometry Proteomics Data on the Grid
Mario Cannataro, University Magna Graecia of Catanzaro, Italy
Pietro Hiram Guzzi, University Magna Graecia of Catanzaro, Italy
Giuseppe Tradigo, University Magna Graecia of Catanzaro, Italy
Pierangelo Veltri, University Magna Graecia of Catanzaro, Italy
Recent advances in high throughput technologies such as mass spectrometry enabled the researchers to collect a huge amount of data when analysing biological samples. Computational Proteomics regards the computational methods for analyzing spectra data in qualitative (i.e. peptide/protein identification in tandem mass spectrometry), and quantitative proteomics (i.e. protein expression measurement in samples), as well as in biomarker discovery (i.e. the identification of a molecular signature of a disease directly from spectra). This chapter presents main standards, tools, and technologies for building scalable, reusable, and portable applications in this field. The chapter surveys available solutions for computational proteomics and describes MS-Analyzer, a Grid-based software platform for the integrated management and analysis of spectra data. MS-Analyzer provides efficient spectra management through a specialized spectra database, and supports the semantic composition of pre-processing and data mining services to analyze spectra on the Grid.
Chapter XII:
High-Throughput Data Analysis of Proteomic Mass Spectra on the SwissBioGrid
Andreas Quandt, Swiss Institute of Bioinformatics, Switzerland
Sergio Maffioletti, ETH Zurich (Swiss National Supercomputing Centre), Switzerland
Cesare Pautasso, University of Lugano, Switzerland
Heinz Stockinger, Swiss Institute of Bioinformatics, Switzerland
Frederique Lisacek, Swiss Institute of Bioinformatics, Switzerland
Peter Kunszt, ETH Zurich (Swiss National Supercomputing Centre), Switzerland
Proteomics is currently one of the most promising fields in bioinformatics as it provides important insights into the function of proteins of organisms. Mass spectrometry is one of the techniques to study the proteome, and several software tools exist for this purpose. The chapter first introduces the protein identification problem, an important application in proteomics, then it presents a meta-computing approach for solving such problem by combining different existing predictors. The chapter presents an extendable software platform called swissPIT that combines different existing tools and exploits Grid infrastructures to speed up the data analysis process for the proteomics pipeline.
Chapter XIII:
Data Mining in Proteomics Using Grid Computing
Fotis E. Psomopoulos, Aristotle University of Thessaloniki, Greece
Pericles A. Mitkas, Aristotle University of Thessaloniki, Greece
The goal of this chapter is the presentation of Data Mining techniques for knowledge extraction in proteomics, taking into account both the particular features of most proteomics issues (such as data retrieval and system complexity), and the opportunities and constraints found in a Grid environment. The chapter discusses how new and potentially useful knowledge can be extracted from proteomics data utilizing Grid resources in a transparent way. As a case study, the problem of protein classification is considered. After introducing an overview of Data Mining algorithms with emphasis on the specific needs of protein classification, the chapter presents a unified methodology for complex Data Mining processes on the Grid, highlighting the different application types and the benefits and drawbacks in each case. Finally, the methodology is validated through real-world case studies, deployed over the EGEE grid environment.
Section III:
Grid-Based Bioinformatics Environments
This section discusses Grid-based software environments for bioinformatics. Grid-based Problem Solving Environments specifically devoted for bioinformatics, as well as Grid-based implementations of relevant bioinformatics applications, such as docking, bio-molecular simulation and molecule structure determinations, are presented.
Chapter XIV:
ProGenGrid: A Grid Problem Solving for Bioinformatics
Maria Mirto, University of Salento, Lecce, Italy & SPACI Consortium, Italy
Italo Epicoco, University of Salento, Lecce, Italy & SPACI Consortium, Italy
Massimo Cafaro, University of Salento, Lecce, Italy & SPACI Consortium, Italy
Sandro Fiore, University of Salento, Lecce, Italy & SPACI Consortium, Italy,
Marco Passante, University of Salento, Lecce, Italy & SPACI Consortium, Italy
Alessandro Negro, University of Salento, Lecce, Italy & SPACI Consortium, Italy
Giovanni Aloisio, University of Salento, Lecce, Italy & SPACI Consortium, Italy
This chapter describes ProGenGrid (Proteomics and Genomics Grid), a Grid Problem Solving Environment specialized for the Bioinformatics domain. The presented system provides an integrated environment for composing, scheduling and monitoring biological applications in a Grid. The main feature offered by this environment is the possibility to use an easy-to-use web interface for composing workflow jobs, scheduled on different grid middlewares.
Chapter XV:
A Graphical Workflow Modeler for Docking Process in Drug Discovery
Qiang Wang, Harbin Institute of Technology, China
Yunming Ye, Harbin Institute of Technology, China
Kunqian Yu, Chinese Academy of Science (CAS), China
Joshua Zhexue Huang, University of Hong Kong, China
A drug discovery process is aimed to find from a large set of molecules the candidate leads that have strong interaction with the target proteins. The process of drug discovery is characterized by its complexity in data and computation. Thus, the combination of complex algorithms and data management functions are necessary for domain scientists to build proper drug discovery procedures. This chapter presents a graphical workflow modeler for domain scientists to perform drug discovery tasks on high performance grid computing platforms. A client/server system is described as the platform for implementation of the graphical workflow modeler. A case study on drug discovery for avian influenza virus is presented to demonstrate the use of this tool in drug discovery research.
Chapter XVI:
BioSimGrid Biomolecular Simulation Database
Kaihsu Tai, University of Oxford, UK
Mark S. P. Sansom, University of Oxford, UK
The chapter presents BioSimGrid, a distributed biomolecular simulation database. It is a general-purpose database for trajectories from molecular dynamics simulations. The presentation of BioSimGrid explains how to install the system, and how to deposit, query, and analyze trajectories with real Python code examples for each step. The chapter also presents the underlying concepts in the implementation of BioSimGrid: relational database, distributed computing, and the input/output (deposit and analysis) modules. The chapter concludes by discussing emerging trends in biomolecular simulations and concerns in the further development of BioSimGrid and similar biological databases.
Chapter XVII:
Molecular Structure Determination on the Grid
Russ Miller, Hauptman-Woodward Medical Research Institute, USA & SUNY-Buffalo, USA
Charles M. Weeks, Hauptman-Woodward Medical Research Institute, USA
The New York State Grid (NYS Grid) is an integrated computational and data grid that provides access to users from around the world to a wide variety of resources. This Grid can be accessed via a Web portal, where the users have access to their data sets and applications, but do not need to be made aware of the details of the data storage or computational devices that are specifically employed in solving their problems. The chapter presents a Grid-enabled version of the SnB and BnP programs which respectively implement the Shake-and-Bake method of molecular structure (SnB) and substructure (BnP) determination. The programs were run on the NYS Grid and their performance evaluation is presented. In particular, SnB has been run simultaneously through the Grid Portal on all the computational resources of the NYS Grid as well as on more than 1100 of the over 3000 processors available through the Open Science Grid.
Section IV:
Grids for Medical Informatics
This Section focuses on Grid-based medical informatics applications. Since a core aspect of medical applications regards the acquisition and analysis of biomedical images, the Section first introduces different aspects of biomedical images visualization on the Grid. Then, as significant applications in the field, Grid analysis of radiological data and 3D electron microscopy reconstruction are discussed. Finally, as an example of biomedical instrumentation made available on the Grid, a hybrid mock circulatory system to test cardiovascular prostheses is presented.
Chapter XVIII:
Aspects of Visualization and the Grid in a Biomedical Context
Ian Greenshields, University of Connecticut, USA
Gamal El-Sayed, University of Connecticut, USA
Visualization, the art and science of representing data visually, is now recognized as an equal partner in the conduct of science via the simulation and modeling paradigm. Different aspects of visualization, such as image preprocessing/analysis, and some computational geometric and computational topological processes are amenable to deployment over compute Grids, but there has been equal focus on the collaborative aspect of Grid computing which is driving collaboration-based visualization systems. The chapter introduces some aspects of visualization and the grid. The chapter first surveys some of the roles of visualization as they relate to the role of Grid computing within a biomedical context. Then it examines certain scheduling strategies that are believed to have value in terms of the distribution of visualization tasks over Grid fabrics.
Volume II
Chapter XIX:
Grid Analysis of Radiological Data
Cécile Germain-Renaud, Université Paris-Sud, CNRS, France
Vincent Breton, IN2P3, CNRS, Clermont-Ferrand, France
Patrick Clarysse, CNRS; Inserm; INSA-Lyon; Université Lyon 1, France
Bertrand Delhay, CNRS; Inserm; INSA-Lyon; Université Lyon 1, France
Yann Gaudeau, Université Strasbourg; CNRS, France
Tristan Glatard, Université de Lyon; CREATIS-LRMN, France
Emmanuel Jeannot, INRIA; CNRS; Université Henri Poincaré, France
Yannick Legré, CNRS; Université Blaise Pascal, France
Johan Montagnat, CNRS; Université Nice Sophia-Antipolis, France
Jean Marie Moureaux, CNRS; Université Henri Poincaré, France
Angel Osorio, CNRS, France
Xavier Pennec, INRIA Sophia-Antipolis, France
Joël Schaerer, CNRS; Inserm; INSA-Lyon; Université Lyon 1, France
Romain Texier, CNRS; Université Nice Sophia-Antipolis, France
Grid technologies and infrastructures can contribute to harnessing the full power of computer-aided image analysis into clinical research and practice. A main challenge in medical images analysis is to fill the gap between the grid middleware and the requirements of clinical applications, that can be summarized in large volumes of data, sensitivity of medical information, and complexity of medical datasets. This chapter reports on the goals, achievements and lessons learned from the AGIR (Grid Analysis of Radiological Data) project. AGIR faces these challenges by providing some core grid medical services (data management, responsiveness, compression, and workflows) supporting medical data processing, and by grid-enabling a panel of applications ranging from algorithmic research to clinical use cases.
Chapter XX:
Grid Computing in 3D Electron Microscopy Reconstruction
J.R. Bilbao Castro, University of Almería. Spain
I. García Fernández, University of Almería. Spain
J.J. Fernández, University of Almería. Spain
Three-dimensional electron microscopy allows scientists to study biological specimens and to understand how they behave and interact with each other depending on their structural conformation. Electron microscopy projections of the specimens are taken from different angles and are processed to obtain a virtual three-dimensional reconstruction for further studies. Nevertheless, the whole reconstruction process is neither straightforward nor cheap in terms of computational costs, thus different computing paradigms have been applied in order to overcome such high costs. The chapter explores the main tasks present in a typical three-dimensional electron microscopy reconstruction process showing parallelization and gridification issues. In addition, important aspects like fault-tolerance are widely covered; given that the distributed nature of a grid infrastructure makes it inherently unstable and difficult to predict. Issues on Automatic Grid Job Management and Automatic Files Replication on the EGEE Grid are also described.
Chapter XXI:
Hybrid Mock Circulatory System to Test Cardiovascular Prostheses on the Grid
Francesco Maria Colacino, University “Magna Græcia” of Catanzaro, Italy & University of
Calabria, Italy
Maurizio Arabia, University of Calabria, Italy
Gionata Fragomeni, University “Magna Græcia” of Catanzaro, Italy
In the last decades cardiovascular diseases greatly increased worldwide and bioengineering provided new technologies and cardiovascular prostheses to medical doctors and surgeons. The design of active and passive devices aroused notable interests becoming more and more challenging as well as crucial. In this framework, it is important to faithfully reproduce the interaction between the prostheses and the cardiovascular system when in-vitro experiments are performed. For this reason, new and improved kind of test benches become necessary. Purely hydraulic mock circulatory systems showed low flexibility to allow tests of different cardiovascular devices and low precision when a reference mathematical model must be reproduced. In this chapter a new bench is described. It combines the computer model of the cardiovascular system and its real-time interaction with the device to be tested. A possible architecture to deploy the solution adopted in a Grid environment to allow remote experimentation is presented.
Section V:
Collaborative Grids for Healthcare and Clinical Applications
Healthcare applications at the population level as well as applications used in the clinical context are the focus of this Section. These applications are the most difficult to port on the Grid, due to the high number of patients involved, the partitioning of data among different health centers, the privacy and security problems and the need of collaboration among various operators. The use of the Grid as a support for collaboration in epidemiology, as well as the integration and analysis of epidemiology data are fully described. Two important HealthGrids for paediatrics and brain injury are presented showing how the Grid is a mature technology also for the clinical setting.
Chapter XXII:
Grid Technologies in Epidemiology
Ignacio Blanquer, Universidad Politécnica de Valencia, Spain
Vicente Hernández, Universidad Politécnica de Valencia, Spain
Epidemiology constitutes one relevant use case for the adoption of grids for health. It combines challenges that have been traditionally addressed by grid technologies, such as managing large amounts of distributed and heterogeneous data, large scale computing and the need for integration and collaboration tools, but introduces new challenges traditionally addressed from the e-health area. Although grid technology has been applied to epidemiology, e.g. for data federation and for evaluating statistical epidemiological models, however, epidemiology presents important constraints that are not solved. The chapter presents the most important problems of epidemiology, such as the semantic integration of data, the effective management of security and privacy, the lack of exploitation models for the use of infrastructures, the instability of Quality of Service and the seamless integration of the technology on the epidemiology environment. Then it presents an analysis of how these issues are being considered in state-of-the-art research.
Chapter XXIII:
IntegraEPI: Epidemiologic Surveillance on the Grid
Fabrício Alves Barbosa da Silva, Universidade de Lisboa, Portugal
Henrique Fabrício Gagliardi, Instituto de Ensino Superior COC, Brasil
Eduardo Gallo, APRAESPI, Brasil
Maria Antónia Madope, Ford Foundation Alumni Association, Moçambique
Virgílio Cavicchioli Neto, Universidade Federal de São Paulo, Brasil
Ivan Torres Pisa, Universidade Federal de São Paulo, Brasil
Domingos Alves, Universidade de São Paulo, Brasil
The chapter presents IntegraEPI, a Grid-based system for space-time visualization, monitoring, modeling and analysis of epidemic data. The system integrates data from heterogeneous epidemic databases and provides analytical and computational methods to increase the predicting capability of the public health system when dealing with epidemic outbreak and prevention. By using IntegraEPI, Health authorities will be able to decide about a set of possible actions that will be previously tested in a virtual population interacting in an urban infrastructure, considering its environmental factors, and finally compare the simulated data to consolidated data of real epidemic dynamics.
Chapter XXIV:
Gridifying Biomedical Applications in the Health-e-Child Project
David Manset, maat Gknowledge, France
Frederic Pourraz, maat Gknowledge, France
Alexey Tsymbal, Siemens AG, Germany
Jerome Revillard, maat Gknowledge, France
Konstantin Skaburskas, CERN, Switzerland
Richard McClatchey, University of the West of England, UK
Ashiq Anjum, University of the West of England, UK
Alfonso Rios, maat Gknowledge, Spain
Martin Huber, Siemens AG, Germany
The Health-e-Child project is developing a Grid-based healthcare platform for European paediatrics and providing seamless integration of traditional and emerging sources of biomedical information. It aims to provide data integration across heterogeneous biomedical information in order to facilitate improved clinical practice, scientific research and personalized healthcare. The goal of this chapter is to share experiences, and present major issues faced, solutions found and a roadmap for future work in developing the Grid infrastructure for interactive biomedical applications. After describing the Grid architecture used in the project, the chapter illustrates a concrete example of one integrated key application, the Health-e-Child CaseReasoner, which is intended for biomedical decision support over the Grid, and is based on similarity search and advanced data visualization techniques.
Chapter XXV:
e-Infrastructures Fostering Multi-Centre Collaborative Research into the Intensive
Care Management of Patients with Brain Injury
Richard O. Sinnott, University of Glasgow, UK
Ian Piper, Southern General Hospital, Glasgow, UK
Clinical research is becoming ever more collaborative with multi-centre trials. An important research field where collaboration among centers and secure access to data is particularly important is the brain injury domain, due to the complicated multi-trauma nature of the disease with its related collation of time-series data. Although many IT-based multi-centre e-Infrastructures such as the Brain Monitoring with Information Technology group and the Cooperative Study on Brain Injury Depolarisations have been formed, a serious impediment to the effective implementation of these networks is access to the know-how and experience needed to install, deploy and manage security-oriented middleware systems that provide secure access to distributed hospital-based datasets and especially the linkage of these data sets across sites. This chapter describes the problems inherent to data collection within the brain injury medical domain, the current IT-based solutions designed to address these problems and how they perform in practice. The chapter describes a Grid-based prototype solution which ultimately formed the basis for the AVERT-IT project. The design of the underlying Grid infrastructure for AVERT-IT and how it will be used to produce novel approaches to data collection, data validation and clinical trial design is also presented.
Section VI:
Grid-Based Virtual Laboratories for Bioinformatics and e-Science
This Section discusses concepts and properties of Virtual Laboratories, an abstraction for cooperative data analysis and distributed collaboration among scientists, and their applications in Life Sciences. After describing the foundations of modern virtual laboratories, such as formalisms for representing domain knowledge, data integration, semantic annotations and shared vocabularies, the Section describes some emergent virtual laboratories that focus on distributed collaboration and use of provenance data, transparent use of the Grid and support for specific domains like bioinformatics.
Chapter XXVI:
Semantic Integration for Research Environments
Tomasz Gubala, University of Amsterdam, The Netherlands & ACC CYFRONET AGH,
Poland
Marian Bubak, Institute of Computer Science AGH, Poland & University of Amsterdam,
The Netherlands
Peter M.A. Sloot, Informatics Institute, University of Amsterdam, The Netherlands
With varying levels of expertise and roles, along with multitudes of data sources and processing units. The high level of required integration contrasts with the loosely-coupled nature of environments which are appropriate for research. A main problem is to support integration of dynamic service-based infrastructures with data sources, tools and users in a way that conserves ubiquity, extensibility and usability. This chapter presents the basic concepts of semantics-based collaborative environments, including semantic data, semantic metadata, annotations, and services integration. Then the authors demonstrate that using semantics as an integration mechanism, i.e. combining formal representations of domain knowledge with techniques like data integration, semantic annotations and shared vocabularies, enables the development of systems for modern e-Science (collaborative laboratories named “collaboratories”). As case study, the way how several semantically-augmented experiments are modeled in the ViroLab virtual laboratory for virology is presented.
Chapter XXVII:
Virtual Laboratory for Collaborative Applications
Marian Bubak, Institute of Computer Science AGH, Poland & University of Amsterdam,
The Netherlands
Maciej Malawski, Institute of Computer Science AGH, Poland
Tomasz Gubala, ACC CYFRONET AGH, Poland & University of Amsterdam,
The Netherlands
Marek Kasztelnik, ACC CYFRONET AGH, Poland
Piotr Nowakowski, ACC CYFRONET AGH, Poland
Daniel Harezlak, ACC CYFRONET AGH, Poland
Tomasz Bartynski, University of Amsterdam, The Netherlands & ACC CYFRONET AGH,
Poland
Joanna Kocot, ACC CYFRONET AGH, Poland
Eryk Ciepiela, ACC CYFRONET AGH, Poland
Wlodzimierz Funika, Institute of Computer Science AGH, Poland
Dariusz Krol, ACC CYFRONET AGH, Poland
Bartosz Balis, Institute of Computer Science AGH, Poland
Matthias Assel, University of Stuttgart, Germany
Alfredo Tirado-Ramos, University of Amsterdam, The Netherlands
Advanced research in Life Sciences requires new information technology solutions to support complex computer simulations, collaborative result analysis and annotation, as well as software reuse. This chapter presents the ViroLab virtual laboratory, which is an integrated system of dedicated tools and services, providing a common space for planning, building, improving and performing in-silico experiments by different groups of users. Within the virtual laboratory, collaborative applications are built as experiment plans, using a notation based on the Ruby scripting language. During experiment execution, provenance data is created and stored. The virtual laboratory enables access to distributed, computational and data resources, in Grid systems, clusters and standalone computers. The process of application development as well as the architecture and functionality of the virtual laboratory are demonstrated using a real-life example from the HIV treatment domain.
Chapter XXVIII:
Leveraging the Power of the Grid with Opal
Sriram Krishnan, University of California San Diego, USA
Luca Clementi, University of California San Diego, USA
Zhaohui Ding, Jilin University, China
Wilfred Li, University of California San Diego, USA
Grid systems provide mechanisms for single sign-on, and uniform APIs for job submission and data transfer, in order to allow the coupling of distributed resources in a seamless manner. However, new users face a daunting barrier of entry due to the high cost of deployment and maintenance. They are often required to learn complex concepts relative to Grid infrastructures (credential management, scheduling systems, data staging, etc). To most scientific users, running their applications with minimal changes and yet getting results faster is highly desirable, without having to know much about how the resources are used. Hence, a higher level of abstraction must be provided for the underlying infrastructure to be used effectively. The chapter describes the Opal toolkit, a framework for exposing applications on Grid resources as simple Web services. Opal provides a basic set of Application Programming Interfaces (APIs) that allows users to execute their deployed applications, query job status, and retrieve results. Opal also provides a mechanism to define command-line arguments and automatically generates user interfaces for the Web services dynamically. In addition, Opal services can be hooked up to a Metascheduler such as CSF4 to leverage a distributed set of resources, and accessed via a multitude of interfaces such as Web browsers, rich desktop environments, workflow tools, and command-line clients.
Chapter XXIX:
The LIBI Grid Platform for Bioinformatics
The LIBI Grid Platform Developers, Italy
The LIBI project (International Laboratory of BioInformatics) aims to develop an advanced bioinformatics and computational biology laboratory, focusing on basic and applied research in modern biology and biotechnologies. The paper presents the core part of the system, a Grid Problem Solving Environment, built on top of EGEE, DEISA and SPACI infrastructures, allowing the submission and monitoring of jobs mapped to complex experiments in bioinformatics. Several case studies on different bioinformatics applications and related results which have been obtained using the LIBI platform are also reported.
Section VII:
Building and Deploying HealthGrids
This Section describes main infrastructures, middleware, and tools for building and deploying HealthGrids. After introducing what are the main requirements on Grid middleware posed by biomedical applications and how these are satisfied by a well know Grid middleware, the Section describes a Grid-based portal that aims to promote collaboration and cooperation among scientists and healthcare research groups on the Grid. The management and deployment of Grid repositories for biomedical digital contents such as mammograms is also discussed.
Chapter XXX:
UNICORE: A Middleware for Life Sciences Grids
Piotr Bala, ICM University of Warsaw & N. Copernicus University, Poland
Kim Baldridge, University of Zurich, Switzerland
Emilio Benfenati, Istituto Mario Negri, Italy
Mosè Casalegno, Istituto Mario Negri, Italy
Uko Maran, University of Tartu, Estonia
Lukasz Miroslaw, University of Zürich, Switzerland
Vitaliy Ostropytskyy, University of Ulster, UK
Katharina Rasch, Technische Universität Dresden, Germany
Sulev Sild, University of Tartu, Estonia
Robert Schöne, Technische Universität Dresden, Germany
Bernd Schuller, Research Centre Juelich, Germany
Nadya Williams, University of Zürich, Switzerland
This chapter provides an overview of Grid middleware and applications related to biomedical and Life Sciences disciplines. Various technologies, including web-based solutions, are presented. One of the solutions, the UNICORE framework, in its recent version implements key grid standards and specifications. The system architecture and capabilities, such as security, workflow and data management are described. Special attention is given to the idea of a ‘gridbean’, which expands the UNICORE use for different applications. Examples of gridbeans are provided and the capabilities of UNICORE are illustrated through specific examples built using this grid middleware. In particular, the Chemomentum workbench and its use for in-silico design and modeling in chemistry and Life Sciences are both described.
Chapter XXXI:
A Grid Paradigm for e-Science Applications
Livia Torterolo, University of Genoa, Italy
Luca Corradi, University of Genoa, Italy
Barbara Canesi, University of Genoa, Italy
Marco Fato, University of Genoa, Italy
Roberto Barbera, University of Catania, Italy & INFN-Catania, Italy
Salvatore Scifo, Consorzio Cometa of Catania, Italy
Antonio Calanducci, INFN-Catania, Italy
Diego Scardaci, INFN (National Institute of Nuclear Physics), Italy
Giordano Scuderi, Unico Informatica s.r.l, Catania, Italy
Nowadays many biomedicine studies are dealing with large, distributed, and heterogeneous repositories as well as with computationally demanding analyses, and complex integration techniques are more often required to handle this complexity. This chapter describes the Bio Med Portal, a Grid oriented platform that aims to promote collaboration and cooperation among scientists and healthcare research groups, enabling the remote use of resources integrated in complex software platform services forming a virtual laboratory. The Bio Med Portal is designed to host several medical services and it is able to deploy several analysis algorithms. The scope of this chapter is both to present a Grid application with its own medical use case and to emphasize the benefit that a new design paradigm based on Grid could provide to research groups spread in geographically distributed sites.
Chapter XXXII:
gLibrary/DRI: A Grid-Based Platform to Host Multiple Repositories for Digital Content
Roberto Barbera, University of Catania, Italy & INFN-Catania, Italy
Antonio Calanducci, INFN-Catania, Italy
Juan Manuel González Martín, maat Gknowledge, Spain
Francisco Prieto Castrillo, CETA-CIEMAT, Spain
Raúl Ramos Pollán, CETA-CIEMAT, Spain
Manuel Rubio del Solar, CETA-CIEMAT, Spain
Dorin Tcaci, maat Gknowledge, Spain
Repositories are digital stores that manage data and metadata providing their access to users, offering an easy-to-use service and a powerful system to handle digital assets. This chapter presents the gLibrary/ DRI (Digital Repositories Infrastructure) platform, a Grid-based digital repository that takes advantage of the Grid features such as VO authentication, file catalogues, and metadata services. The main goal of the platform is to reduce the cost in terms of time and effort that a repository provider spends to get its repository deployed. This is achieved by providing a common infrastructure and a set of mechanisms that repository providers use to define the data model, the access to the content and the storage model. Two use cases are also presented: a mammograms repository example that provides clinicians with a tool that eases diagnostics process and an algorithmic repository based on the Poincare Surface Section.
Section VIII:
Selected Readings
This Section is a short collection of suggested readings of different authors, aiming to enrich this book with others knowledge, experience, thought and insight. After introducing Cloud Computing and describing the porting of applications to Grids and Clouds, the Section describes a bio-inspired approach for the construction of a self-organizing Grid information system, and concludes discussing Grid-based implementation of phylogenetic analysis.
Chapter XXXIII:
Porting Applications to Grids and Clouds
Wolfgang Gentzsch, Duke University, USA
Cloud Computing is an emerging style of computing in which dynamically scalable resources are provided as a service. The chapter describes main stages of implementing applications on Grid and Cloud infrastructures. As a case study, the chapter presents the Distributed European Infrastructure for Supercomputing Applications (DEISA) and describes the DEISA Extreme Computing Initiative (DECI) for porting and running scientific grand challenge applications on the DEISA Grid. The chapter concludes proposing the top ten rules of building a sustainable Grid.
Chapter XXXIV:
Evaluating a Bio-Inspired Approach for the Design of a Grid Information System:
The SO-Grid Portal
Agostino Forestiero, Institute of High Performance Computing and Networking
CNR-ICAR, Italy
Carlo Mastroianni, Institute of High Performance Computing and Networking
CNR-ICAR, Italy
Fausto Pupo, Institute of High Performance Computing and Networking
CNR-ICAR, Italy
Giandomenico Spezzano, Institute of High Performance Computing and Networking
CNR-ICAR, Italy
The chapter describes a bio-inspired approach for the construction of a self-organizing Grid information system, with the aim to foster the use of swarm intelligence, multi-agent and bio-inspired paradigms in the field of distributed computing. The chapter also describes the SO-Grid Portal, a simulation portal through which registered users can simulate and analyze the ant-based protocols and perform “parameter
sweep” studies.
Chapter XXXV:
Large-Scale Co-Phylogenetic Analysis on the Grid
Heinz Stockinger, Swiss Institute of Bioinformatics, Switzerland
Alexander F. Auch, University of Tübingen, Germany
Markus Göker, University of Tübingen, Germany
Jan Meier-Kolthoff, University of Tübingen, Germany
Alexandros Stamatakis, Ludwig-Maximilians-University Munich, Germany
Phylogenetic data analysis represents an extremely compute-intensive area of Bioinformatics and thus requires high-performance technologies. In particular, the chapter focuses on hostparasite co-phylogenetic analysis, another compute- and memory-intensive problem. After introducing some tools for conducting such co-phylogenetic studies, the chapter describes their enhanced Grid-based implementations. Since the computational core of the problem is embarrassingly parallel, the chapter shows how the parallel implementation fits well to a computational Grid and reduces the response time of large scale analyses.
|
 |
ISBN:
978-1-60566-374-6
|
| Hard Cover |
| Publisher: |
Medical Information Science Reference |
| Release Date: |
May 2009 |
| Pages: |
1050 |
| List Price: |
$495.00 |
| |
|
|
Perpetual Access:
|
| |
$745.00 |
| |
|
|
|
Print + Perpetual Access:
|
| |
$990.00 |
| |
|
|
|