Accelerating Multi Dimensional Queries in Data Warehouses

Accelerating Multi Dimensional Queries in Data Warehouses

Russel Pears (Auckland University of Technology, New Zealand)
DOI: 10.4018/978-1-60566-172-8.ch011
OnDemand PDF Download:


Data Warehouses are widely used for supporting decision making. On Line Analytical Processing or OLAP is the main vehicle for querying data warehouses. OLAP operations commonly involve the computation of multidimensional aggregates. The major bottleneck in computing these aggregates is the large volume of data that needs to be processed which in turn leads to prohibitively expensive query execution times. On the other hand, Data Analysts are primarily concerned with discerning trends in the data and thus a system that provides approximate answers in a timely fashion would suit their requirements better. In this chapter we present the Prime Factor scheme, a novel method for compressing data in a warehouse. Our data compression method is based on aggregating data on each dimension of the data warehouse. Extensive experimentation on both real-world and synthetic data have shown that it outperforms the Haar Wavelet scheme with respect to both decoding time and error rate, while maintaining comparable compression ratios (Pears and Houliston, 2007). One encouraging feature is the stability of the error rate when compared to the Haar Wavelet. Although Wavelets have been shown to be effective at compressing data, the approximate answers they provide varies widely, even for identical types of queries on nearly identical values in distinct parts of the data. This problem has been attributed to the thresholding technique used to reduce the size of the encoded data and is an integral part of the Wavelet compression scheme. In contrast the Prime Factor scheme does not rely on thresholding but keeps a smaller version of every data element from the original data and is thus able to achieve a much higher degree of error stability which is important from a Data Analysts point of view.
Chapter Preview


Previous research has tended to concentrate on computing exact answers to OLAP queries (Ho, and Agrawal, 1997, Wang 2002). Ho describes a method that pre-processes a data cube to give a prefix sum cube. The prefix sum cube is computed by applying the transformation: P[Ai]=C[Ai]+P[Ai-1] along each dimension of the data cube, where P denotes the prefix sum cube, C the original data cube, Ai denotes an element in the cube, and i is an index in a range 1..Di (Di is the size of the dimension Di). This means that the prefix cube requires the same storage space as the original data cube.

The above approach is efficient for low dimensional data cubes. For high dimensional environments, two major problems exist. Firstly, the number of accesses required is (Ho et al, 1997), which can be prohibitive for large values of d (where d denotes the number of dimensions). Secondly, the storage required to store the prefix sum cube can be excessive. In a typical OLAP environment the data tends to be massive and yet sparse at the same time. The degree of sparsity increases with the number of dimensions (OLAP) and thus the number of non zero cells may be a very small fraction of the prefix sum cube, which by its nature has to be dense for its query processing algorithms to work correctly.

Complete Chapter List

Search this Book:
Editorial Advisory Board
Table of Contents
Chapter 1
Hong Zhang, Rajiv Kishore, Ram Ramesh
A conceptual modeling grammar should be based on the theory of ontology and possess clear ontological semantics to represent problem domain... Sample PDF
Semantics of the MibML Conceptual Modeling Grammar: An Ontological Analysis Using the Bunge-Wand-Weber Framework
Chapter 2
Henry M. Kim, Arijit Sengupta, Mark S. Fox, Mehmet Dalkilic
This paper introduces a measurement ontology for applications to semantic Web applications, specifically for emerging domains such as microarray... Sample PDF
A Measurement Ontology Generalizable for Emerging Domain Applications on the Semantic Web
Chapter 3
Zhiyuan Chen
Environmental research and knowledge discovery both require extensive use of data stored in various sources and created in different ways for... Sample PDF
Semantic Integration and Knowledge Discovery for Environmental Research
Chapter 4
Vijayan Sugumaran, Gerald DeHondt
Software reuse has been discussed in the literature for the past three decades and is widely seen as one of the major areas for improving... Sample PDF
Towards Code Reuse and Refactoring as a Practice within Extreme Programming
Chapter 5
Miguel I. Aguiree-Urreta, George M. Marakas
Requirements elicitation has been recognized as a critical stage in system development projects, yet current models prescribing particular... Sample PDF
Requirements Elicitation Technique Selection: A Theory-Based Contingency Model
Chapter 6
VenuGopal Balijepally, Sridhar Nerur, RadhaKanta Mahapatra
Software development in organizations is evolving and increasingly taking a socio-technical hue. While empirical research guided by common sense... Sample PDF
IT Value of Software Development: A Multi-Theoretic Perspective
Chapter 7
Amel Mammar
UB2SQL is a tool for designing and developing database applications using UML and B formal method. The approach supported by UB2SQL consists of two... Sample PDF
UB2SQL: A Tool for Building Database Applications Using UML and B Formal Method
Chapter 8
Juliette Gutierrez
Crime reports are used to find criminals, prevent further violations, identify problems causing crimes and allocate government resources.... Sample PDF
Using Decision Trees to Predict Crime Reporting
Chapter 9
Karen Corral, David Schuff, Robert D. St. Louis, Ozgur Turetken
Inefficient and ineffective search is widely recognized as a problem for businesses. The shortcomings of keyword searches have been elaborated upon... Sample PDF
A Model for Estimating the Savings from Dimensional vs. Keyword Search
Chapter 10
Praveen Madiraju, Rajshekhar Sunderraman, Shamkant B. Navathe, Haibin Wang
Global semantic integrity constraints ensure the integrity and consistency of data spanning distributed databases. In this chapter, we discuss a... Sample PDF
Integrity Constraint Checking for Multiple XML Databases
Chapter 11
Russel Pears
Data Warehouses are widely used for supporting decision making. On Line Analytical Processing or OLAP is the main vehicle for querying data... Sample PDF
Accelerating Multi Dimensional Queries in Data Warehouses
Chapter 12
Vikas Agrawal, P. S. Sundararaghavan, Mesbah U. Ahmed, Udayan Nandkeolyar
Data warehouse has become an integral part in developing a DSS in any organization. One of the key architectural issues concerning the efficient... Sample PDF
View Materialization in a Data Cube: Optimization Models and Heuristics
Chapter 13
Athman Bouguettaya, Zaki Malik, Xumin Liu, Abdelmounaam Rezgui, Lori Korff
The ubiquity of the World Wide Web facilitates the deployment of highly distributed applications. The emergence of Web databases and applications... Sample PDF
WebFINDIT: Providing Data and Service-Centric Access through a Scalable Middleware
Chapter 14
James E. Wyse
Location-based mobile commerce (LBMC) incorporates location-aware technologies, wire-free connectivity, and server-based repositories of business... Sample PDF
Retrieval Optimization for Server-Based Repositories in Location-Based Mobile Commerce
Chapter 15
Shing-Han Li, Shi-Ming Huang, David C. Yen, Cheng-Chun Chang
The lifecycle of information system (IS) became relatively shorter compared with earlier days as a result of information technology (IT) revolution... Sample PDF
Migrating Legacy Systems to Web Services Architecture
Chapter 16
Myeong Ho Lee
The trend toward convergence, initiated by advances in ICT, entails the creation of new value chain networks, made up by partnerships between actors... Sample PDF
A Socio-Technical Interpretation of IT Convergence Services: Applying a Perspective from Actor Network Theory and Complex Adaptive Systems
Chapter 17
T. Ariyachandra, L. Dong
Past evidence suggests that organizational transformation from IT implementations is rare. Data warehousing promises to be one advanced information... Sample PDF
Understanding Organizational Transformation from IT Implementations: A Look at Structuration Theory
Chapter 18
Yuan Long, Keng Siau
Drawing on social network theories and previous studies, this research examines the dynamics of social network structures in Open Source Software... Sample PDF
Social Networks Structures in Open Source Software Development Teams
Chapter 19
Susanta Mitra, Aditya Bagchi, A. K. Bandyopadhyay
A social network defines the structure of a social community like an organization or institution, covering its members and their... Sample PDF
Design of a Data Model for Social Networks Applications
About the Contributors