Abstract
The Health-e-Child project started in January 2006 with the aim of developing a Grid-based healthcare platform for European paediatrics and providing seamless integration of traditional and emerging sources of biomedical information. The objective of this chapter is to share experiences, and present major issues faced, solutions found and a roadmap for future work in developing the Grid infrastructure for interactive biomedical applications in the project, as Health-e-Child approaches its final phases. This chapter starts with a brief introduction of the project itself, followed by a description of its architecture on the Grid. It then illustrates the approach with the description of a concrete example of one integrated key application, the Health-e-Child CaseReasoner, which is intended for biomedical decision support over the Grid, and is based on similarity search and advanced data visualization techniques.
Top1. Introduction
In recent time demand has risen for more holistic views of patients’ health so that healthcare can be delivered at the appropriate time, by the appropriate clinician, with the appropriate means at the level of individual patients. The Health-e-Child (HeC, pronounced “healthy child”) project (“Health-e-Child,” 2008) aims to provide data integration across heterogeneous biomedical information in order to facilitate improved clinical practice, scientific research and ultimately such personalised healthcare. As one of the largest integrated projects of the 6th Framework Programme of the European Commission, HeC brings together three major paediatric medical centres with several European companies, university groups and research institutions specialised in Grid-based biomedical information integration and related technologies.
The main objectives of the HeC project are:
- •
To gain a comprehensive view of a child’s health by vertically integrating biomedical data, information and knowledge that spans the entire spectrum from the genetic through clinical to the epidemiological;
- •
To develop a biomedical information platform, supported by sophisticated search, optimisation and matching techniques for heterogeneous information, empowered by Grid technology;
- •
To build enabling tools and services on top of the HeC platform, that will lead to innovative and better healthcare solutions in Europe, based on:
Integrated disease models exploiting all available information levels.
Database-guided biomedical decision support systems provisioning novel clinical practices and personalized healthcare for children.
Large-scale, cross-modality, and longitudinal information fusion and data mining for biomedical knowledge discovery.
The realization of these project goals requires an infrastructure that is highly dependable and reliable. Indeed, physicians demand guarantees that the system will be always available and that the processes which integrate and manipulate patient data will be reliable, even in the case of failures. The infrastructure will have to allow for transparent access to distributed data, to provide a high degree of scalability, and to efficiently schedule access to computationally intensive services by applying sophisticated load-balancing strategies. Consider a scenario where a similarity search across the entire HeC patient population is needed to make a better decision over a critical case. In order to support such a search possibility on demand intensive query processing, feature extraction and distributed similarity calculations have to take place. All these steps require significant computing power, storage capacity and an acceptable quality of service (QoS) over the infrastructure resources.
Consequently, the HeC project has as one of its primary objectives, the delivery of a complete suite of Grid-based and cost-efficient tools for individualised disease prevention, screening, early diagnosis and therapy and associated follow-up for paediatric diseases across three different domains; cardiology (e.g. Right Ventricle Overload caused by Atrial Septal Defect or the Tetralogy of Fallot), rheumatology (Juvenile Idiopathic Arthritis), and neuro-oncology (e.g. Pilocytic Astrocytoma). To facilitate this, it has started building a gLite-enabled European network linking leading clinical centres to enable them to share and annotate biomedical data, to validate systems clinically and to disseminate clinical excellence across Europe by establishing new technologies, clinical workflows and standards in the domain.
The project brings together three heterogeneous communities, in a well-balanced configuration, which can be described as three equally important cornerstones:
- •
The collaboration of clinicians and healthcare workers from the cardiology, rheumatology and neuro-oncology domains, bringing together the expertise that is crucial to identifying relevant clinical research directions;
- •
The cooperation between medical imaging and health IT experts, who are able to bridge the clinical and IT worlds; and
- •
The marriage of the Grid and distributed computing technologies, where experts harness the power of the Grid to solve requests coming from the other two communities.
Key Terms in this Chapter
Sun Grid Engine (SGE): Policy-based workload management and dynamic provisioning of application workloads [http://www.sun.com/software/gridwar].
File Catalog (FC): offers a hierarchical view of and UNIX-like interface to files stored on grid. FC catalog provides Logical File Name (LFN) to Storage URL (SURL) mappings and authorization for file access.
Data Visualization and Visual Data Mining: Data visualization is a field of study that concentrates on the use of computer-supported tools to explore and represent large amount of data. Data visualization focuses on the creation of approaches for conveying abstract information in intuitive ways.Visual representations take advantage of the human eye’s broad bandwidth pathway into the mind. Practical application of data visualization in decision support involves selecting, transforming and representing data in a form that facilitates data exploration and understanding, often called knowledge discovery or visual data mining. Important aspects of data visualization are the interactivity and dynamics of visual representation. For example, CaseReasoner implements treemaps, relative neighbourhood graphs (RNGs) and heatmaps as techniques for the visualization of inter-patient similarity in clinical decision support, assisting also in clinical knowledge discovery. The importance of data visualization and visual data mining for knowledge discovery and decision support has long been underestimated; this field has a great promise of being significantly advanced in the near future and of finding a variety of successful applications in different subject domains.
HealthGrid: A healthgrid has been defined as ‘an environment in which data of medical interest can be stored and made easily available to different actors in healthcare systems such as physicians, healthcare centres, patients and citizens’. Healthgrids focus equally on the sharing of data (and the associated issues of privacy and ethics) and on distributed health analysis across the biomedical spectrum from public health to patient care and from tissue/organ data to cellular and genomic information. For individualised healthcare, healthgrids are envisaged to facilitate access to biomedical information and ultimately knowledge, no matter where the requestor of that information may reside or where the relevant data is stored: biomedical information on demand. Much research activity continues in the field of healthgrids.
Clinical Decision Support Systems: Clinical decision support systems (CDSSs) are interactive computer programs, which are intended to assist physicians or other healthcare professionals with decision making. CDSS link clinical or health observations with accumulated facts and/or knowledge influence decision choices by clinicians for improved personalised healthcare. CDSS usually contains a (medical) knowledge base and a reasoning mechanism (often these are a set of rules derived from the experts and the evidence-based medicine principle), used to derive conclusions from future observations. Typical examples of CDSS include rule-based expert systems, like the CDSS pioneer, MYCIN, and Computer-Assisted Detection (CAD) systems assisting radiologists in analyzing and evaluating medical images comprehensively in a short time, for example detecting a likely tumour. The objective and the main challenge of the HeC project is to develop a CDSS making use of (distributed) data of different modalities (clinical, imaging and genetic) and of different vertical levels (molecule, tissue, organ, individual, and population), as a part of the European paediatric platform being implemented.
Virtual Organisation Membership Service, VOMS: holds information about VO members and their membership with respect to groups they are assigned to.
Storage Element (SE): Service which provides storage resources and manages requests for storage space and files. The storage space managed can be disk space, tape space or a combination of the two.
Case-Based Reasoning Systems: Case-Based Reasoning (CBR) systems is an important sub-class of decision support systems using reasoning based on similarity as the central element of decision support. The basic components of a CBR system include a Case Base (serving as the Knowledge Base and including previously observed cases with known decisions), a Similarity Assessment element used for case retrieval, and a Solution Adaptation block for the ultimate decision making based on the decisions of similar cases stored in the case base. For example, HeC CaseReasoner can be regarded as a typical CBR decision support system where decisions are made based on inter-patient similarity. The main benefit of CBR systems is the inherent transparency of the case retrieval and decision support process that raises the trust of the user in suggested decisions, which is especially important in clinical decision support.
Computing Element (CE): service representing a computing resource. Its main functionality is job management (job submission, job control, etc.).
Condor: specialized workload management system for compute-intensive jobs [http://www.cs.wisc.edu/condor/].
High Throughput Computing: in computer science a term to describe the use of many computing resources over long periods of time to accomplish a computational task.
Metadata Catalog: A catalog to store any sort of “data about data”.