An Approach to Configuration Management of Scientific Workflows

An Approach to Configuration Management of Scientific Workflows

Tassio Ferenzini Martins Sirqueira (Federal University of Juiz de Fora, Juiz de Fora, Brazil & Vianna Junior Institute, Juiz de Fora, Brazil), Regina Braga (Federal University of Juiz de Fora, Juiz de Fora, Brazil), Marco Antônio P. Araújo (Federal University of Juiz de Fora, Juiz de Fora, Brazil & Federal Institute of Southeast Minas Gerais, Juiz de Fora, Brazil), José Maria N. David (Federal University of Juiz de Fora, Juiz de Fora, Brazil), Fernanda Campos (Federal University of Juiz de Fora, Juiz de Fora, Brazil) and Victor Ströele (Federal University of Juiz de Fora, Juiz de Fora, Brazil)
Copyright: © 2017 |Pages: 27
DOI: 10.4018/IJWP.2017070102
OnDemand PDF Download:


A scientific software ecosystem aims to integrate all stages of an experiment and its related workflows, in order to solve complex problems. In this vein, in order to assure the experiment proper execution, any modification that occurs must be propagated to the associated workflows, which must be maintained and evolved for the successful conduction of the research. One way to ensure this control is through configuration management using data provenance. In this work, the authors use data provenance concepts and models, together with ontologies to provide an architecture for the storage and query of scientific experiment information. Considering the architecture, a proof of concept was conducted using workflows extracted from the myExperiment repository. The results are presented along the paper.
Article Preview

1. Introduction

A scientific experiment is defined as a series of interconnected operations (Goble et al., 2010), which can be executed using one or more workflows. A scientific workflow is a model or template that represents a sequence of scientific activities implemented by tools in order to reach a certain objective (Deelman et al., 2009). The wide adoption of scientific workflows, as a mechanism to aggregate existing services, has radically revolutionized the way scientists conduct their experiments, since workflows allow to gather evidence for or against a hypothesis, and still demonstrate a known fact (Belhajjame et al, 2011).

According to (Nardi, 2009), users of scientific workflows, most of the time, work in a specific field of research and do not always have a computer science adequate training. Often, they begin an application by copying an existing workflow and then adjusting it to their needs. In this vein, another important issue is the loss of the researcher's knowledge about the experiment (Marinho et al., 2012), due to the delegation of tasks to computers that usually perform isolated actions, without documentation. Thus, to represent and support the development of a scientific experiment, it is necessary to register the associated workflows and their variations, since they can be modified during the research (Mattoso et al., 2010).

One way of storing this data is to use provenance models (Buneman et al., 2001), storing data produced from scientific workflows (Sirqueira et al., 2016). The use of provenance data allows the scientist to compose new workflows based on the reuse of data from previous ones. However, only provenance data used in isolation does not allow adequate control of the experiment and its associated workflows, making it difficult to manage the experiment as a whole. According to Hasan et al. (2007), it is necessary to use independent tools to manage the experiment and analyze its data, considering that Scientific Workflow Management Systems (SWMS) do not have this functionality. It considers only the researcher responsible for the workflow (Pereira et al., 2009), providing no collaboration mechanism, distribution and reuse support. This additional data, i.e., workflow versions, associated workflows, related experiments, and results are important for the publication of the experiment.

In this context, the objective of this work is to treat configuration management of scientific workflows throughout the experiment life cycle, based on the maintenance, evolution, and reuse of experiment´s data to improve the experimentation process and its use in other related contexts. Since each phase of the scientific experiment cycle presents specific tasks, and each modification on the execution of a task generates new versions of the workflow (Sirqueira et al., 2016), we consider this control essential for the proper execution and control of a scientific experiment. This article details the E-SECO ProVersion approach, which extends the E-SECO ecosystem (Freitas et al., 2015), to control and manage scientific workflows related to a given experiment, using provenance data and ontologies. In this vein, the research question can be defined as: Is E-SECO ProVersion architecture capable to derive maintenance and evolution information from experiments and related workflows?

Considering Figure 1, which details the experimentation life cycle of the E-SECO ProVersion approach, the configuration management is performed by the module “Configuration Management”, which encompass the whole process.

Figure 1.

Experiment life cycle in E-SECO ProVersion approach

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 9: 2 Issues (2017)
Volume 8: 1 Issue (2016)
Volume 7: 2 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing