Provenance Tracking and End-User Oriented Query Construction

Provenance Tracking and End-User Oriented Query Construction

Bartosz Balis (Institute of Computer Science AGH, Poland), Marian Bubak (Institute of Computer Science AGH, Poland and University of Amsterdam, The Netherlands), Michal Pelczar (ACC CYFRONET AGH, Poland) and Jakub Wach (ACC CYFRONET AGH, Poland)
DOI: 10.4018/978-1-60566-374-6.ch004
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Provenance tracking is an indispensable element of each e-Science infrastructure for conducting in silico experiments. However, enabling end-users who are non-IT experts to query provenance and experiment data in a meaningful way is equally important. The authors propose an ontology-based provenance model which captures the execution of in silico experiments, as well as domain-specific semantics of data and computations used in those experiments. They demonstrate how ontologies can serve as inter-lingua for end-users, provenance tracking system, and query tools. Query Translation Tools (QUaTRO), enabling end-user oriented, ontology-guided visual querying over provenance records and experiment data, are also presented. In those tools, they also show how the ontology models enable semantic information integration of provenance metadata and experiment data, enabling queries capable of exploring the structure of provenance and associated experiment data. Their approach is demonstrated on a Drug Resistance application deployed in the ViroLab Project.
Chapter Preview
Top

Introduction

The term ‘e-Science’ (Hey 2002) was coined to denote a new type of scientific research based on the collaboration within a number of scientific areas, enabled by a next generation infrastructure, wherein people, computing resources, data and instruments are brought together to bring a new quality to the everyday work of researchers. The infrastructure in question is usually identified with Grid systems which offer at least two benefits important for loosely-coupled cross-institution research and collaboration: virtualization and sharing of resources, and building of virtual organizations. Recently, the increasing importance of semantics and knowledge for a future e-Science infrastructure has been emphasized (De Roure 2005). The term Semantic Grid has been used to denote a Grid infrastructure wherein information services are enhanced with well-defined meaning enabling better cooperation between computers and people (Goble, De Roure 2004). They key enabler to achieve the vision of Semantic Grid are Semantic Web technologies, such as ontologies which are a standard, highly sharable, and machine-processable way to represent vocabularies and semantic relationships in a given domain (Goble, Corcho 2006).

Scientific experiment results are neither reliable nor reusable without their provenance, i.e. the information on the origin and history of these results (Groth 2006). Though provenance in computer science originated in the database community, the difference between database provenance and provenance in e-Science has been pointed out (Tan 2007). Both types of provenance describe the origin of a piece of data, i.e. they answer the question what other pieces of data contributed to a given result. However, while in the case of a database, the piece of data is a result of a whitebox’ database query, in e-Science, it is a result of a ‘blackbox’ process. Hence different provenance models and different methods to compute provenance are needed in the two communities (Ibid.).

The importance of provenance tracking in e-Science environments has been pointed out many times and numerous provenance approaches have been proposed. However, providing an adequate end-user support for provenance querying is also a challenge. It has been recognized that the need for provenance queries goes beyond the lineage of a single data item and searching or mining over many provenance records might be useful (Moreau 2007). Nevertheless, there are technical and conceptual barriers preventing or making it difficult for end-users of e-Science environments, such as domain researchers and specialists, to construct complex queries using query languages such as XQuery, SQL or SPARQL. Therefore, there is a need for provenance query support which would enable end-users to construct powerful queries in an easy way.

Key Terms in this Chapter

End-user oriented querying: is a query construction methodology which enables non-IT experts to construct powerful queries to repositories of experiment data and provenance using domain-specific terms instead of those of underlying data models and query languages.

Provenance: of a piece of data is metadata which describes the derivation history of this piece of data (Simmhan 2005). It can be expressed either as a process that led to that piece of data or its data dependency graph. Provenance has several important uses in e-Science including estimation of quality and reliability of a scientific result, a means to audit a piece of data, or a prerequisite to repeat an experiment.

Virtual Laboratory: is a set of integrated components that, used together, form a distributed and collaborative space for science.

E-Science: is a new type of scientific research based on the collaboration within a number of scientific areas, enabled by a next generation infrastructure, wherein people, computing resources, data and instruments are brought together to bring a new quality to the everyday work of researchers.

Semantic Grid: is and extension of the Grid “in which information and services are given well-defined meaning, better enabling computers and people to work in cooperation” (Goble 2004).

Experiment plan: is a recipe that describes the process of certain experiment execution in the environment of the virtual laboratory.

Ontology: is a specification of a conceptualization which provides a shared vocabulary for a given domain, as well as relationships between and constraints imposed upon concepts.

Experiment: is a process that combines together data with a set of activities that act on that data in order to yield experiment results.

Complete Chapter List

Search this Book:
Reset