In recent years, our sensing capability has increased manifold. The developments in sensor technology, telecommunication, computer networking and distributed computing domain have created strong grounds for building sensor networks that are now reaching global scales (Balazinska et al., 2007). As data sources are increasing, the task of processing and analysis has gone beyond the capabilities of conventional desktop data processing tools. For quite a long time, data was assumed to be available on the single user-desktop; and handling, processing as well as analysis was carried out single-handedly. With proliferation of streaming data-sources and near real-time applications, it has become important to make provisions of automated identification and attribution of data-sets derived from such diverse sources. Considering the sharing and reuse of such diverse data-sets, the information about: the source of data, ownership, time-stamps, accuracy related details, processes and transformations subjected to it etc. have become essential. The piece of data that provide such information about the given data-set is known as Metadata. The need is recognized for creating and handling of metadata as an integrated part of large-scale systems. Considering the information requirements of scientific and research community, the efforts towards the building global data commons have came into existence (Onsrud & Campbell, 2007). A special type of service is required that can address the issues like: explication of licensing & Intellectual Property Rights, standards based automated generation of metadata, data provenance, archival and peer-review. While each of these terms is being addressed as individual research topics, the present article is focused only on Data Provenance.
Provenance is used in art, science, technology and many other fields for a long time. With the recent developments of database management systems (DBMS) responding to increasingly complex utilization scenarios, the importance of data provenance has become evident. In simplest from, data provenance in DBMS can be utilized to hold information about how, when any why the Views are created. Sensor networks, enterprise systems and collaborative applications involve more complex data provenance approaches required to handle large data stores (Ledlie, Ng, & Holland, 2005).
The purpose of this article is to introduce only the important terms and approaches involved in Data Provenance to the Data Mining and Data Warehousing community. For the detailed account of the subject including historical development, taxonomy of approaches, recent research and application trends, the readers are advised to refer (Bose & Frew, 2005) and (Simmhan, Plale, & Gannon, 2005).