A Distributed System for The Management of Fine-grained Provenance

A Distributed System for The Management of Fine-grained Provenance

Salmin Sultana (Purdue University, West Lafayette, IN, USA) and Elisa Bertino (Purdue University, West Lafayette, IN, USA)
Copyright: © 2015 |Pages: 16
DOI: 10.4018/JDM.2015040103
OnDemand PDF Download:
No Current Special Offers


Existing provenance systems operate at a single layer of abstraction (workflow/process/OS) at which they record and store provenance. However, the provenance captured from different layers provides the highest benefit when integrated through a unified provenance framework. To build such a framework, a comprehensive provenance model able to represent the provenance of data objects with various semantics and granularity is the first step. In this paper, the authors propose a provenance model able to represent the provenance of any data object captured at any abstraction layer and present an abstract schema of the model. The expressive nature of the model enables a wide range of provenance queries. The authors also illustrate the utility of their model in real world data processing systems. In the paper, they also introduce a data provenance distributed middleware system composed of several different components and services that capture provenance according to their model and securely stores it in a central repository. As part of our middleware, the authors present a thin stackable file system, called FiPS, for capturing local provenance in a portable manner. FiPS is able to capture provenance at various degrees of granularity, transform provenance records into secure information, and direct the resulting provenance data to various persistent storage systems.
Article Preview

Requirements of a Provenance Model

In order to provide a generic provenance structure for all kinds of data objects, the provenance model must meet the following requirements:

Unified Framework: The model must be able to represent metadata provided by the various provenance systems. Although a number of system-call based provenance architectures (Frew, Metzger, & Slaughter, 2008) (Muniswamy-Reddy, Holland, Braun, & Seltzer, 2006) have been proposed to capture file provenance, there is no well defined model to represent and organize such low level metadata. One important goal for any comprehensive provenance model is to bridge this gap and provide a unified model able to represent provenance for any kind of data at any abstraction layer. To this end, it is crucial to identify a comprehensive set of features that can characterize the existing provenance systems and systemize provenance management.

Provenance Granularity: Provenance may be fine-grained, e.g. provenance of data tuples in a database (Woodruff & Stonebraker, 1997), or coarse-grained, such as for a file in a provenance-aware file system (Muniswamy-Reddy, Holland, Braun, & Seltzer, 2006) or for collections of files generated by an ensemble experiment run (Plale, Gannon, Reed, Droegemeier, Wilhelmson, & Ramamurthy, 2005). The usefulness of provenance in a certain domain is highly related to the granularity at which it is recorded (Simmhan, Plale, & Gannon, 2005). Thus, the provenance model should be flexible enough to encapsulate various subjects and details of provenance based on user specifications.

Security: The model must support provenance security. Access control and privacy protection are primary issues in provenance security (Groth, et al., 2006). The problem of access control for provenance is complicated by the fact that different access control policies, possibly from different sources, may have to be enforced. Moreover, the data originators may specify personal preferences on the disclosure of particular provenance information. To meet these requirements, the provenance model must support the specification of privacy-aware fine grained access control policies and user preferences.

Interoperability: A data object can be modified by and shared among multiple computing systems and so is the provenance. To support provenance exchange, the model must support interoperability among provenance models and integration of provenance across different systems. Thus the model must conform to the Open Provenance Model (OPM), which provides a high level representation of provenance focusing on interoperability.

Provenance Queries and Views: The model should support various types of provenance queries. Historical dependencies as well as subsequent usages of a data object should be tracked easily.

If a data is processed in multiple system domains, an administrator might want to see a high level machine, system or domain view of the provenance graph. In addition, to find relevant information from large provenance graphs, one should be able to filter, group or summarize all/portions of provenance graphs and to generate tailored provenance views. Thus, the model should be able to distinguish the provenance generated from different systems and facilitate queries for constructing specialized views of provenance graphs.

Complete Article List

Search this Journal:
Open Access Articles
Volume 32: 4 Issues (2021): 2 Released, 2 Forthcoming
Volume 31: 4 Issues (2020)
Volume 30: 4 Issues (2019)
Volume 29: 4 Issues (2018)
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing