Encountering Incomplete Temporal Information in Clinical Data Warehouses

Encountering Incomplete Temporal Information in Clinical Data Warehouses

Georgia Garani, Canan Eren Atay
DOI: 10.4018/IJARPHM.2020010103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

A clinical data warehouse (CDW) can be an important tool for the purposes of analysis and critical decision making in the medical field. Such a data repository integrates heterogeneous health data, including clinical, treatment and diagnostic data and laboratory test results from a variety of sources. Accurate data need to be stored and processed in a CDW with adequate computation capabilities and thus, time plays a crucial factor. A slowly changing dimension (SCD) is a dimension that changes slowly over time, either gradually or intermittently. This article introduces a new SCD type, Type BTA, where both valid time and transaction time are supported for providing a complete history of the dimensional data. With Type BTA, the history of an object can be captured through the changes as reflected in the CDW. Consequently, for the first time, the full history of retroactive and post-active changes can be preserved in a CDW. Specifically, Type BTA is implemented for a Clinical Data Warehouse using real cancer data, for which the advantages of this methodology are demonstrated and advocated.
Article Preview
Top

Introduction

A data warehouse (DW) is arguably one of the most important intelligence tools that can be utilized by any commercial company or other type of organization that wants to benefit from timely decisions based on solid data. Therefore, one of the main prerequisites in today’s competitive business environment is to implement a DW that can support that level of quality analysis and decision making.

Particularly in the domain of healthcare information systems, it is generally acknowledged that data warehousing is one of the strategies that is quite promising. The Clinical Data Warehouse (CDW) is a type of DW designed specially to accommodate the needs of users in the medical world, where clinical data from patient care processes are gathered for analysis to perform clinical quality management, medical research, and similar decision making. Since healthcare data are complex, heterogeneous, and scattered, it is usually difficult, time consuming, and laborious to integrate them reliably. The CDW provides a powerful solution for data integration by making various analyses and queries possible on the data. Patient medical records, lab tests, diagnoses, and treatments evolve over time, and as a result a CDW has to store this kind of historical data for maintaining past, present, and future versions of data.

Two orthogonal timelines have been identified in databases: valid time and transaction time. Valid time represents when a fact is true in the modeled reality, while transaction time corresponds to the recording time of the values in the database. DWs need to incorporate both time dimensions — valid time and transaction time. In other words, data warehouses require bitemporal data representation. Therefore, in order to support analysis on the basis of historical data aggregation and concurrently keep track of the history of all modifications, a bitemporal DW is needed. There are many applications that can benefit from the support of both valid and transaction times in dimensions, including those of the insurance, manufacturing, distribution, and banking industries. CDWs especially would be one domain where significant benefit could be obtained from such bitemporal support, i.e., both “as-was” and “as-is” reporting.

Kimball introduced the notion of slowly changing dimension (SCD) as an important dimension of DWs which changes occasionally over time. Since then, numerous different types have been proposed for dealing with this problem. However, all of the approaches presented up to now support only the valid time dimension — namely, those of Kimball and Ross (2013) and Faisal and Sarwar (2014). In these methodologies, the history of an object may be captured, yet the history of retroactive and post-active changes is not captured, and as a result it is impossible in those approaches to reproduce the full history of the dimensional data. When both time dimensions, valid time and transaction time, are stored in the DW, only then will decision making be accurate and valuable enough to be used by decision support systems (Johnston, 2014).

This is well illustrated in the ongoing worldwide battle against cancer. According to the Global Cancer Observatory (gco.iarc.fr), throughout the world during 2012 alone, there were reported a total of 14.1 million new cases of cancer and no fewer than 8.2 million cancer-related deaths. The most commonly diagnosed cancers were lung at 13%, breast at 11.9%, and colon at 9.7%. In terms of cancer deaths, the most common were due to cancer of the lung at 19.4%, liver at 9.1%, and stomach at 8.8%. If the statistical growth rate of cancer continues at this pace, then by 2025 there will be a total of 19.3 million new cases of cancer, due to the expected increase in world population and the aging of that population (and possibly even more from the proliferation of manmade chemicals and other environmental degradation). Cancer is a complicated disease, affected by many different types of variables and capable of unforeseen fluctuation within populations and individual patients. Additionally, there are many factors that should not be disregarded during diagnosis, follow up, and treatment processes. Fortunately, the future use of bitemporal data could provide better quality services and obtain more positive results in understanding and arresting the progression of the disease.

Complete Article List

Search this Journal:
Reset
Volume 9: 1 Issue (2024)
Volume 8: 1 Issue (2023)
Volume 7: 2 Issues (2022): 1 Released, 1 Forthcoming
Volume 6: 2 Issues (2021)
Volume 5: 2 Issues (2020)
Volume 4: 2 Issues (2019)
View Complete Journal Contents Listing