Article Preview
TopIntroduction
A data warehouse is a large collection of integrated data, built to assist knowledge workers, such as executives, managers, analysts, etc., to make better and faster decisions. It is often required that data be summarized at various levels of detail and on various attributes to allow knowledge workers to analyze the data through a variety of views in on-line analytical processing (OLAP). Typical OLAP applications include product performance and profitability, effectiveness of sales programs or marketing campaigns, sales forecasting, capacity planning, etc. Data warehousing and OLAP have increasingly become a focus of the database industry.
OLAP systems generally support a multidimensional data model, which is also known as the data cube (Gray, 1997). Construction of a data cube is based on the set of selected attributes of the database. Certain attributes are chosen to be the measure attributes, i.e., attributes whose values are of interest, while some others are selected as dimension or functional attributes (Geffner, 1999). The values of the measure attributes are often aggregated according to the dimension attributes for analysis. The size of a data cube can be huge when the number of combinations of dimension attribute values is large.
The storage of data cubes is essential to OLAP. Much research (Agarwal, 1996; Beyer, 1999; Han, 2001; Morfonios, 2006; Xin, 2003; Zhao, 1997) has focused on the materialization of data cubes, that is, to pre-compute and store all possible combinations of multi-dimensional aggregates for fast multi-dimensional analysis. Some notable cube materialization algorithms proposed include ROLAP-based multi-dimensional aggregate computation (Agarwal, 1996, Morfonios, 2006), multi-way array aggregation (Beyer, 1999), BUC (Han, 2001), H-cubing (Xin, 2003), Star-cubing (Zhao, 1997), Minimal cubing (Li, 2004), etc. Since materializing data cubes are generally computationally intensive and space consuming, much effort has been devoted to reducing the computation and storage space of data cubes. These efforts include partial materialization of data cubes (Harinarayan, 1996), iceberg cube computation (Han, 2001; Xin, 2003; Zhao, 1997), computation of condensed, dwarf, and quotient cubes (Lakshmanan, 2002; Lakshmanan, 2003; Sismanis, 2002; Beyer, 1999; Wang, 2002), and computation of approximate cubes (Barbara, 1997; Cuzzocrea, 2006; Shanmugasundaram, 1999). While these pre-computed data cubes can be used to answer queries quickly, tremendous overhead is incurred in maintaining these pre-computed aggregate values as updates can propagate to a large number of relevant cells.