Maintaining Dimension's History in Data Warehouses Effectively

Maintaining Dimension's History in Data Warehouses Effectively

Canan Eren Atay (Dokuz Eylül University, Izmir, Turkey) and Georgia Garani (University of Thessaly, Volos, Greece)
Copyright: © 2019 |Pages: 17
DOI: 10.4018/IJDWM.2019070103

Abstract

A data warehouse is considered a key aspect of success for any decision support system. Research on temporal databases have produced important results in this field, and data warehouses, which store historical data, can clearly benefit from such studies. A slowly changing dimension is a dimension in which any of its attributes in a data warehouse can change infrequently over time. Although different solutions have been proposed, each has its own particular disadvantages. The authors propose the Object-Relational Temporal Data Warehouse (O-RTDW) model for the slowly changing dimensions in this research work. Using this approach, it is possible to keep track of the whole history of an object in a data warehouse efficiently. The proposed model has been implemented on a real data set and tested successfully. Several limitations implied in other solutions, such as redundancy, surrogate keys, incomplete historical data, and creation of additional tables are not present in our solution.
Article Preview

Introduction

The remarkable increase in the amount of data collected by organizations these days for the purpose of extracting business information requires special consideration by the database research community. Companies collect data to generate decision-making strategies. However, multiple decision support environments, which operate independently, require data to be heavily redundant. Because of this issue, two IBM researchers, Barry Devlin and Paul Murphy, introduced the concept of a “business data warehouse” in the late 1980s. Since then, data warehousing has been an active research field. The main purpose of data warehousing is to access historical data that cannot be altered. According to Bill Inmon (2002), a data warehouse (DW) is “a collection of subject-oriented, integrated, non-volatile and time-variant data to support management’s decisions.” The non-volatile and time-variant data features of data warehousing suggest that it should allow changes to the data values without overwriting the existing values.

The types of data maintained by a DW are used in management reports, various business queries, decision support systems, management information systems, and data mining applications by linking the various data that are distributed throughout an organization. The increasing demands of financial analyses in healthcare (Silver et al., 2001), clinical data mining (Lyman et al., 2008), laboratory test data analyses (Allard, 2003), disease control (Wisniewski et al., 2003), adverse drug event control (Einbinder & Scully, 2002), the clinical decision process (Banek et al., 2006), and information feedback for hospital practice management (Grant et al., 2006) in the field of healthcare have given rise to the research and development of clinical DWs. Undoubtedly, the user of a DW needs to be assured that the data being stored are timely, accurate, and complete.

The problem of time support in a database management system has also been an active field of research for many years. Temporal databases can capture either the history of objects and their attributes (known as valid time), or the history of the database activities (known as transaction time). A valid time is necessary to model the time-varying states of an object and its attributes. These states may be in the past, present, or even future. A transaction time denotes the timestamp of any change as it is recorded in a database. This time is system generated and may not extend beyond the current time.

DWs store historical data and can therefore clearly benefit from research on temporal databases. To manage time-varying multidimensional data, temporal data warehouses (TDWs) have been proposed by combining the research fields of temporal databases and DWs. DWs commonly include a time dimension indicating the timeframe for measures. However, the time dimension cannot be used to keep track of changes in other dimensions. In many cases, changes in the dimensional data and the time at which they occur are important requirements for analysis. Such dimensions, in which attribute values may change are called slowly changing dimensions (SCDs). Kimball (2002a) proposed several implementation solutions for this problem; Type 1 overwrites a value, Type 2 adds a new row in the dimension with the updated attribute values and Type 3 adds a new attribute in the dimension to preserve the old attribute value. He also proposed other techniques where he combined hybrid approaches. Nonetheless, these solutions are not adequate because they do not preserve the entire data history or are not easy to implement. Furthermore, such solutions do not consider research related to managing the time-varying information in a temporal database.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 16: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 15: 4 Issues (2019): 3 Released, 1 Forthcoming
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing