Data Provenance in Scientific Workflows

Data Provenance in Scientific Workflows

Khalid Belhajjame (University of Manchester, UK), Paolo Missier (University of Manchester, UK) and Carole Goble (University of Manchester, UK)
DOI: 10.4018/978-1-60566-374-6.ch003
OnDemand PDF Download:
No Current Special Offers


Data provenance is key to understanding and interpreting the results of scientific experiments. This chapter introduces and characterises data provenance in scientific workflows using illustrative examples taken from real-world workflows. The characterisation takes the form of a taxonomy that is used for comparing and analysing provenance capabilities supplied by existing scientific workflow systems.
Chapter Preview


In this section, we formally define what a scientific workflow is. We then go on to present example of workflow provenance queries taken from the domain of bioinformatics.

Key Terms in this Chapter

Web Service: A web service can be defined as a software program that provides an API whereby it can be invoked over the internet using XML-based standard protocols.

Semantic Annotations: Semantic annotations are specifications that define the meaning and the form of an object. They are often encoded in the form of relationships that links the object subject to description to concepts from ontologies.

Workflow: A workflow is the computerised representation of a process (e.g., software construction, registration process at the university). It specifies the various activities of the process that have to be executed in some order, the flow of data between activities and the multiple collaborating agents that execute activities to fulfil a common objective.

In silico Experiment: An in silico experiment is a routine that employs computational analysis tools to verify, amongst other things, the validity of a scientific hypothesis or demonstrate a known fact.

Workflow System: A workflow system, a.k.a. workflow management system, is a software responsible for enacting workflows. It instantiates the activities that compose a workflow, assigns their execution to agents and coordinates these executions.

Semantic Annotation of Web Services: They are semantic annotations that describe the task performed by a web service and the forms and the domains of the parameters they take as input and the result they delivers as a result of a service execution.

Data Provenance: data provenance can be thought of as the process by which the data instances and the analysis operations that were used to derive a given data instance are identified.

Complete Chapter List

Search this Book: