Towards Next Generation Provenance Systems for E-Science

Towards Next Generation Provenance Systems for E-Science

Fakhri Alam Khan (University of Vienna, Austria), Sardar Hussain (University of Glasgow, UK), Ivan Janciak (University of Vienna, Austria) and Peter Brezany (University of Vienna, Austria)
DOI: 10.4018/978-1-4666-4161-7.ch003
OnDemand PDF Download:
No Current Special Offers


e-Science helps scientists to automate scientific discovery processes and experiments, and promote collaboration across organizational boundaries and disciplines. These experiments involve data discovery, knowledge discovery, integration, linking, and analysis through different software tools and activities. Scientific workflow is one technique through which such activities and processes can be interlinked, automated, and ultimately shared amongst the collaborating scientists. Workflows are realized by the workflow enactment engine, which interprets the process definition and interacts with the workflow participants. Since workflows are typically executed on a shared and distributed infrastructure, the information on the workflow activities, data processed, and results generated (also known as provenance), needs to be recorded in order to be reproduced and reused. A range of solutions and techniques have been suggested for the provenance of data collection and analysis; however, these are predominantly workflow enactment engine and domain dependent. This paper includes taxonomy of existing provenance techniques and a novel solution named VePS (The Vienna e-Science Provenance System) for e-Science provenance collection.
Chapter Preview

Concepts And Terminology

e-Science is a science or research theme that exploits Grid- or Cloud-based solutions more often called e-Infrastructure. The term e-Infrastructure is used for the technology that supports research undertaken comprising of distributed and on-demand computing software. e-Science provides researchers with shared access to large data collections, advanced ICT tools for data analysis, large scale computing resources, and high performance visualization, among other examples. According to Greenwood et al. (2003):

Complete Chapter List

Search this Book: