From Conventional to Multiversion Data Warehouse: Practical Issues

From Conventional to Multiversion Data Warehouse: Practical Issues

Khurram Shahzad
DOI: 10.4018/978-1-60566-816-1.ch003
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The Data warehouse is not an autonomous data store, because it depends upon its operational source(s) for data population. Due to changes in real-world scenarios, operational sources may evolve, but the conventional data warehouse is not developed to handle the modifications in evolved operational sources. Therefore, instance and schema changes in operational sources cannot be adapted in the conventional data warehouse without loss of information. Multiversion data warehouses are proposed as an alternative to handle these problems of evolution. In this chapter we discuss and illustrate how versioning is implemented and how it can be used in practical data warehouse lifecycle. It is designed as a tutorial for users to collect and understand the concepts behind a versioning solution. Therefore, the purpose of this chapter is to collect and integrate the concepts, issues and solutions of multiversion data warehouses in a tutorial-like approach, to provide a unified source for users that need to understand version functionality and mechanisms.
Chapter Preview
Top

Introduction

Online transaction process systems (OLTP) are used to meet day-to-day requirements of an enterprise. But OLTPs’ are unable to meet decision support requirements of an enterprise, because a) their schemas are not optimized to support decision-support queries and b) they are not made to support decision making (Paulraj, 2001; Kimball, 2002).

Data is extracted from OLTP(s), transformed and loaded in data warehouse after removing inconsistencies. Therefore, the data warehouse is an integrated and materialized view of data, which is optimized to support decision making (Chaudhuri, 1997). The data warehouse works as a data source for various types of applications e.g. analytical processing, decision making OLAP, data mining tools, dashboards etc.

Multidimensional models with central fact and surrounding dimension relations are typically used for designing a data warehouse, with two-fold benefits: on one hand they are close to the way of thinking of decision makers analyzing the data, therefore helping those users in understanding the underlying data; on the other hand, they allow designers to predict users’ intentions (Rizzi, 2007).

For data population the data warehouse depends upon its operational sources (also called OLTPs). Therefore, changes in operational sources may lead to derivation of inconsistent outputs from data warehouse (Bebel, 2004). These can be divided into two types: ‘i) schema changes, i.e. insert/update/delete records, ii) content changes, i.e. add/modify/ drop an attribute or a table’ (Wrembel, 2004; Rundensteiner, 2000).

Inconsistent outputs, generated due to changes in operational sources, can be handled in two ways (Wrembel, 2005): ‘i) evolution approach, ii) versioning approach’. According to the evolution approach, changes are made to the data warehouse and data is transformed to the changed data warehouse, after which the previous one is removed (Blaschka, 1999). But, shortcomings of the approach are identified by a number of authors [see (Bebel, 2004; Golfarelli, 2004; Golfarelli, 2006, Wrembel, 2005) for details]. Whereas, according to the versioning approach, a new version of the data warehouse is created, changes are made to the new version, data is populated in the new version and both versions are maintained (Ravat, 2006).

Complete Chapter List

Search this Book:
Reset