Optimization of Multidimensional Aggregates in Data Warehouses

Optimization of Multidimensional Aggregates in Data Warehouses

Russel Pears (Auckland University of Technology, New Zealand) and Bryan Houliston (Auckland University of Technology, New Zealand)
DOI: 10.4018/978-1-60566-058-5.ch143
OnDemand PDF Download:
$37.50

Abstract

The computation of multidimensional aggregates is a common operation in OLAP applications. The major bottleneck is the large volume of data that needs to be processed which leads to prohibitively expensive query execution times. On the other hand, data analysts are primarily concerned with discerning trends in the data and thus a system that provides approximate answers in a timely fashion would suit their requirements better. In this article we present the prime factor scheme, a novel method for compressing data in a warehouse. Our data compression method is based on aggregating data on each dimension of the data warehouse. We used both real world and synthetic data to compare our scheme against the Haar wavelet and our experiments on range-sum queries show that it outperforms the latter scheme with respect to both decoding time and error rate, while maintaining comparable compression ratios. One encouraging feature is the stability of the error rate when compared to the Haar wavelet. Although wavelets have been shown to be effective at compressing data, the approximate answers they provide varies widely, even for identical types of queries on nearly identical values in distinct parts of the data. This problem has been attributed to the thresholding technique used to reduce the size of the encoded data and is an integral part of the wavelet compression scheme. In contrast the prime factor scheme does not rely on thresholding but keeps a smaller version of every data element from the original data and is thus able to achieve a much higher degree of error stability which is important from a Data Analysts point of view.

Complete Chapter List

Search this Book:
Reset