Improving the Domain Independence of Data Provenance Ontologies: A Demonstration Using Conceptual Graphs and the W7 Model

Improving the Domain Independence of Data Provenance Ontologies: A Demonstration Using Conceptual Graphs and the W7 Model

Jun Liu (College of Business and Information Systems, Dakota State University, Madison, SD, USA) and Sudha Ram (Eller College of Management, University of Arizona, Tucson, AZ, USA)
Copyright: © 2017 |Pages: 20
DOI: 10.4018/JDM.2017010104

Abstract

Provenance is becoming increasingly important as more and more people are using data that they themselves did not generate. In the last decade, significant efforts have been directed toward developing generic, shared data provenance ontologies that support the interoperability of provenance across systems. An issue that is impeding the use of such provenance ontologies is that a generic provenance ontology, no matter how complete it is, is insufficient for capturing the diverse, complex provenance requirements in different domains. In this paper, the authors propose a novel approach to adapting and extending the W7 model, a well-known generic ontology of data provenance. Relying on various knowledge expansion mechanisms provided by the Conceptual Graph formalism, the authors' approach enables us to develop domain ontologies of provenance in a disciplined yet flexible way.
Article Preview

Introduction

Since the start of the new millennium, people have been sharing data in an unprecedented scale and richness. In scientific domains such as biology and chemistry, the trend of “big science” signified by large scale collaborative projects such as the iPlant Collaborative (http://www.iplantcollaborative.org) demands the sharing of data over organizational boundaries and even across disciplines. For businesses, Big Data is a key component in competition, growth and innovation, and much of Big Data originates outside of the company that is absorbing it. With the large-scale proliferation and sharing of data, questions such as “Where did this data come from?”, “Who else is using this data?”, and “Why is this piece of data here?” are becoming increasingly common (Ram & Liu, 2012). Data provenance, often referred to as “origin”, “lineage” “history”, or “pedigree” of data, contains the answers to the questions. When data travel beyond the specific setting in which they are generated, it is imperative that the provenance of the data needs to be captured to ensure the trustworthiness of the data.

In the last decade, significant research has been conducted to standardize the semantics of data provenance and develop a shared provenance ontology that allows unambiguous interpretation of provenance, supports interoperability of data provenance between systems, and improves the usability of data provenance by enabling richer queries. One of the earliest efforts in standardizing provenance semantics is the development of the W7 model (Ram & Liu, 2007). The W7 model conceptualizes provenance as consisting seven Ws including what, when, where, how, who, which and why, and it has been adopted in research such as (Lupelli et al., 2015; Narock, Yoon, & March, 2014; Prat & Madnick, 2008), etc. Another widely used provenance model is the Open Provenance Model (OPM) (Moreau et al., 2011). The OPM represents the provenance of objects by an annotated causality graph. A causality graph captures the causal dependencies between three types of nodes: artifacts, processes and agents. Other well-known provenance ontologies include Provenance Vocabulary (Hartig & Zhao, 2010) and PROV-DM model (Belhajjame et al., 2012). These generic provenance ontologies are designed to be domain and architecture independent. They support a digital representation of provenance for any “thing” so that provenance can be exchanged between systems by means of a compatibility layer based on a shared provenance model (Moreau et al., 2011).

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 30: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 29: 4 Issues (2018): 3 Released, 1 Forthcoming
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing