A Dynamic Grid File for High-Dimensional Data Cube Storage and Range-Sum Querying

A Dynamic Grid File for High-Dimensional Data Cube Storage and Range-Sum Querying

Wen-Chi Hou (Southern Illinois University, USA), Xiaoguang Yu (Southern Illinois University, USA), Chih-Fang Wang (Southern Illinois University, USA), Cheng Luo (Coppin State University, USA) and Michael Wainer (Southern Illinois University, USA)
Copyright: © 2009 |Pages: 18
DOI: 10.4018/jdm.2009062503
OnDemand PDF Download:
No Current Special Offers


In this article; the authors propose to use the grid file to store multi-dimensional data cubes and answer angesum queries. The grid file is enhanced with a dynamic splitting mechanism to accommodate insertions of data. It overcomes the drawback of the traditional grid file in storing uneven data while enjoying its advantages of simplicity and efficiency. The space requirement grows linearly with the dimension of the data cube; compared with the exponential growth of conventional methods that store pre-computed aggregate values for range-sum queries. The update cost is O (1); much faster than the pre-computed data cube approaches; which generally have exponential update cost. The grid file structure can also respond to range queries quickly. They compare it with an approach that uses the R*-tree structure to store the data cube. The experimental results show that the proposed method performs favorably in file size; update speed; construction time; and query response time for both evenly and unevenly distributed data.
Article Preview


A data warehouse is a large collection of integrated data, built to assist knowledge workers, such as executives, managers, analysts, etc., to make better and faster decisions. It is often required that data be summarized at various levels of detail and on various attributes to allow knowledge workers to analyze the data through a variety of views in on-line analytical processing (OLAP). Typical OLAP applications include product performance and profitability, effectiveness of sales programs or marketing campaigns, sales forecasting, capacity planning, etc. Data warehousing and OLAP have increasingly become a focus of the database industry.

OLAP systems generally support a multidimensional data model, which is also known as the data cube (Gray, 1997). Construction of a data cube is based on the set of selected attributes of the database. Certain attributes are chosen to be the measure attributes, i.e., attributes whose values are of interest, while some others are selected as dimension or functional attributes (Geffner, 1999). The values of the measure attributes are often aggregated according to the dimension attributes for analysis. The size of a data cube can be huge when the number of combinations of dimension attribute values is large.

The storage of data cubes is essential to OLAP. Much research (Agarwal, 1996; Beyer, 1999; Han, 2001; Morfonios, 2006; Xin, 2003; Zhao, 1997) has focused on the materialization of data cubes, that is, to pre-compute and store all possible combinations of multi-dimensional aggregates for fast multi-dimensional analysis. Some notable cube materialization algorithms proposed include ROLAP-based multi-dimensional aggregate computation (Agarwal, 1996, Morfonios, 2006), multi-way array aggregation (Beyer, 1999), BUC (Han, 2001), H-cubing (Xin, 2003), Star-cubing (Zhao, 1997), Minimal cubing (Li, 2004), etc. Since materializing data cubes are generally computationally intensive and space consuming, much effort has been devoted to reducing the computation and storage space of data cubes. These efforts include partial materialization of data cubes (Harinarayan, 1996), iceberg cube computation (Han, 2001; Xin, 2003; Zhao, 1997), computation of condensed, dwarf, and quotient cubes (Lakshmanan, 2002; Lakshmanan, 2003; Sismanis, 2002; Beyer, 1999; Wang, 2002), and computation of approximate cubes (Barbara, 1997; Cuzzocrea, 2006; Shanmugasundaram, 1999). While these pre-computed data cubes can be used to answer queries quickly, tremendous overhead is incurred in maintaining these pre-computed aggregate values as updates can propagate to a large number of relevant cells.

Complete Article List

Search this Journal:
Open Access Articles
Volume 32: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 31: 4 Issues (2020)
Volume 30: 4 Issues (2019)
Volume 29: 4 Issues (2018)
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing