Schema Evolution in Multiversion Data Warehouses

Schema Evolution in Multiversion Data Warehouses

Waqas Ahmed, Esteban Zimányi, Alejandro A. Vaisman, Robert Wrembel
Copyright: © 2021 |Pages: 28
DOI: 10.4018/IJDWM.2021100101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Data warehouses (DWs) evolve in both their content and schema due to changes of user requirements, business processes, or external sources to name a few. Although multiple approaches using temporal and/or multiversion DWs have been proposed to handle these changes, an efficient solution for this problem is still lacking. The authors' approach is to separate concerns and use temporal DWs to deal with content changes, and multiversion DWs to deal with schema changes. To address the former, previously, they have proposed a temporal multidimensional (MD) model. In this paper, they propose a multiversion MD model for schema evolution to tackle the latter problem. The two models complement each other and allow managing both content and schema evolution. In this paper, the semantics of schema modification operators (SMOs) to derive various schema versions are given. It is also shown how online analytical processing (OLAP) operations like roll-up work on the model. Finally, the mapping from the multiversion MD model to a relational schema is given along with OLAP operations in standard SQL.
Article Preview
Top

Introduction

A Data Warehouse (DW) integrates data coming from different sources (Vaisman & Zimányi, 2014) and supply it to various systems, including Business Intelligence (BI) applications. Data warehouses change in their content and schema. Typically, content changes are due to routine business operations or the correction of existing data. An example of a content change is a modification in the price of a product.

In practice, changes to a DW schema result from (1) the evolution of external data sources, (2) changes of the real-world represented by a DW, (3) new user requirements, and (4) the creation of simulation environments to list the most common causes. An example of a schema change is a change in the geographical hierarchy of a sales network. As reported in (Moon, Curino, Deutsch, Hou, & Zaniolo, 2008; Qiu, Li, & Su, 2013; Sjøberg, 1993; Vassiliadis, Zarras, & Skoulis, 2017), the schemas of data sources change frequently. For example, the Wikipedia schema changed on average every 9-10 days during the 4.5 years of its lifetime. As a result of the changes in data sources, the content and schema of the related DWs must also change. Real-world examples of scenarios leading to changes in DWs can be found in (Eder, Koncilia, & Kogler, 2002; Rundensteiner, Koeller, & Zhang, 2000).

A DW should keep track of the evolution of its content and schema to reconstruct the state of the world under consideration at any instant without losing data. A temporal data warehouse (TDW) (Golfarelli & Rizzi, 2009) keeps track of the evolution of its contents whereas, a multiversion data warehouse (MVDW), based on multischema data management principles (Roddick, 1995; Herrmann, Voigt, Pedersen, & Lehner, 2018), handles content and schema changes by creating multiple and persistent DW versions.

Even though version management in databases has been researched for over 30 years (in the context of object databases, relational databases, data warehouses, XML databases), it is still an active research field. This research is regaining its importance in the context of NoSQL storage and data lakes. Support for version management was explicitly stated as a requirement for data lakes management systems (Nargesian, Zhu, J. Miller, Pu, & Patricia C., 2019).

The temporal DW-based approaches profit from the support of various temporal query languages to analyze changing data and from the existence of multiple index structures. Such approaches are suitable for representing historical data versions and not for representing and managing schema changes. On the other hand, the MV approaches allow managing both data and schema changes. However, the implementation of such approaches is more complicated. Moreover, they possess the limited capabilities of querying DW versions.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing