Article Preview
TopIntroduction
The remarkable increase in the amount of data collected by organizations these days for the purpose of extracting business information requires special consideration by the database research community. Companies collect data to generate decision-making strategies. However, multiple decision support environments, which operate independently, require data to be heavily redundant. Because of this issue, two IBM researchers, Barry Devlin and Paul Murphy, introduced the concept of a “business data warehouse” in the late 1980s. Since then, data warehousing has been an active research field. The main purpose of data warehousing is to access historical data that cannot be altered. According to Bill Inmon (2002), a data warehouse (DW) is “a collection of subject-oriented, integrated, non-volatile and time-variant data to support management’s decisions.” The non-volatile and time-variant data features of data warehousing suggest that it should allow changes to the data values without overwriting the existing values.
The types of data maintained by a DW are used in management reports, various business queries, decision support systems, management information systems, and data mining applications by linking the various data that are distributed throughout an organization. The increasing demands of financial analyses in healthcare (Silver et al., 2001), clinical data mining (Lyman et al., 2008), laboratory test data analyses (Allard, 2003), disease control (Wisniewski et al., 2003), adverse drug event control (Einbinder & Scully, 2002), the clinical decision process (Banek et al., 2006), and information feedback for hospital practice management (Grant et al., 2006) in the field of healthcare have given rise to the research and development of clinical DWs. Undoubtedly, the user of a DW needs to be assured that the data being stored are timely, accurate, and complete.
The problem of time support in a database management system has also been an active field of research for many years. Temporal databases can capture either the history of objects and their attributes (known as valid time), or the history of the database activities (known as transaction time). A valid time is necessary to model the time-varying states of an object and its attributes. These states may be in the past, present, or even future. A transaction time denotes the timestamp of any change as it is recorded in a database. This time is system generated and may not extend beyond the current time.
DWs store historical data and can therefore clearly benefit from research on temporal databases. To manage time-varying multidimensional data, temporal data warehouses (TDWs) have been proposed by combining the research fields of temporal databases and DWs. DWs commonly include a time dimension indicating the timeframe for measures. However, the time dimension cannot be used to keep track of changes in other dimensions. In many cases, changes in the dimensional data and the time at which they occur are important requirements for analysis. Such dimensions, in which attribute values may change are called slowly changing dimensions (SCDs). Kimball (2002a) proposed several implementation solutions for this problem; Type 1 overwrites a value, Type 2 adds a new row in the dimension with the updated attribute values and Type 3 adds a new attribute in the dimension to preserve the old attribute value. He also proposed other techniques where he combined hybrid approaches. Nonetheless, these solutions are not adequate because they do not preserve the entire data history or are not easy to implement. Furthermore, such solutions do not consider research related to managing the time-varying information in a temporal database.