Article Preview
TopIntroduction
The availability of important experimental and computational facilities nowadays induces large-scale scientific projects to produce a never before observed amount of experimental and simulation data. This wealth of data needs to be structured and managed in a way that readily makes sense to scientists, so that relevant knowledge may be extracted to contribute to the scientific investigation process. Current data management technologies are clearly unable to cope with scientists' requirements (Stonebraker et al., 2009), despite the efforts the community has dedicated to the area. Such efforts can be measured by the community support to an international conference (SSDBM), running for almost 20 years on scientific and statistical database management, various workshops on associated themes, and important projects such as POSTGRES at Berkeley (Stonebraker and Rowe, 1986). All these initiatives have considerably contributed to extend database technology towards the support to scientific data management.
Giving such a panorama, one may ask what could be missing on the support to scientific applications from a database viewpoint. In this paper, we investigate this question from the perspective of data management support for the complete scientific life-cycle, from hypotheses formulation to experiment validation. As it turns out, efforts in this area have been steered towards supporting the in-silico experimental phase of the scientific life-cycle (Mattoso et al. 2010), involving the execution of scientific workflows and the management of the associated data and metadata. The complete scientific life-cycle extends beyond that, and includes the studied phenomenon, formulated hypotheses and computational models. The lack of support to these elements in current in-silico approaches leaves extremely important information out-of-reach of the scientific community.
This paper contributes to fill this gap, by introducing a scientific hypothesis conceptual model. In this model, the starting point of a scientific investigation is the natural phenomenon description. The studied phenomenon occurs in nature in some space-time frame, in which selected physical quantities are observed. Scientific hypotheses conceptually represent the scientific models a scientist conceives to explain the observed phenomenon. Testing hypotheses in-silico involves running experiments, representing the scientific models, and confronting simulated data with collected observations.
The proposed conceptual model is the basis for registering the complete scientific exploration life-cycle. The following benefits are brought by this approach:
- •
Extends the in-silico support beyond the experimental phase and towards the complete scientific life-cycle;
- •
Supports provenance information regarding scientific hypotheses evolution;
- •
Facilitates the communication among scientists in a research groups (by exposing their mental models);
- •
Supports the reproducibility of experiments (by enhancing the experiment metadata with hypotheses and models);
- •
Supports model steering (by investigating models evolution);
- •
Supports experiment result analyses (by relating models, models parameters and simulated results);
In order to illustrate the use of the proposed conceptual model, a case study is discussed, based on models of the human cardio-vascular system. The phenomenon is simulated by a complex and data intensive numerical simulation that runs for days to compute a single blood cycle on a cluster with 1200 nodes. The analyses of simulated results are supported by the SciDB (Cudre-Mauroux et al., 2009), multi-dimensional array database system.
The remainder of this paper is structured as follows. Initially we discuss some related work. The next section describes a use case concerning the simulation of the human cardiovascular system. The Hypothesis Conceptual Model that integrates scientific hypotheses to the in-silico experiment entities is presented in the following section. This model is the base to develop a database prototype using SciDB in support of the cardio vascular scientific hypothesis, which is described next section. Finally, we conclude the paper with suggestions for future work.