Summarizing Datacubes: Semantic and Syntactic Approaches

Summarizing Datacubes: Semantic and Syntactic Approaches

Rosine Cicchetti (Aix-Marseille Universités, France), Lotfi Lakhal (Aix-Marseille Universités, France), Sébastien Nedjar (Aix-Marseille Universités, France), Noël Novelli (Aix-Marseille Universités, France) and Alain Casali (Aix-Marseille Universités, France)
DOI: 10.4018/978-1-60960-537-7.ch002


Datacubes are especially useful for answering efficiently queries on data warehouses. Nevertheless the amount of generated aggregated data is huge with respect to the initial data which is itself very large. Recent research work has addressed the issue of summarizing Datacubes in order to reduce their size. In this chapter, we present three different approaches. They propose structures which make it possible to reduce the size of the data cube representation. The two former, the closed cube and the quotient cube, are said semantic and discard the redundancies captured within data cubes. The size of the underlying representations is especially reduced but the counterpart is an additional response time when answering the OLAP queries. The latter approach is rather syntactic since it enforces an optimization at the logical level. It is called Partition Cube and based on the concept of partition. We also give an algorithm to compute it. We propose a Relational Partition Cube, a novel R-Olap cubing solution for managing Partition Cubes using the relational technology. An analytical evaluation shows that the storage space of Partition Cubes is smaller than Datacubes. In order to confirm analytical comparison, experiments are performed in order to compare our approach with Datacubes and with two of the best reduction methods, the Quotient Cube and the Closed Cube.
Chapter Preview


In order to efficiently answer Olap queries (Chaudhuri and Dayal, 1997), a widely adopted solution is to compute and materialize Datacubes (Gray et al., 1997). For example, given a relation r over the schema 978-1-60960-537-7.ch002.m01, a set of dimensions978-1-60960-537-7.ch002.m02, 978-1-60960-537-7.ch002.m03, a measure 978-1-60960-537-7.ch002.m04, an aggregate function f, the cube operator is expressed as follows:

Select D_1, D_2, D_3, f(M) 
From r 
Group By Cube(D_1, D_2, D_3) 

Dimensions are also called categorical attributes and r a categorical database relation. The given query achieves all the possible group-by according to any attribute combination belonging to the power set of 978-1-60960-537-7.ch002.m05. It results in what is called a Datacube, and each sub-query performing a single group-by yields a cuboid. Computing Datacubes is exponential in the number of dimensions (the dimension powerset lattice must be explored), and the problem worsens when very large data sets are to be aggregated.

Datacubes are considerably larger than the input relation. Ross and Srivastava (1997) exemplify the problem by achieving a full Datacube encompassing more than 210 millions of tuples from an input relation having 1 million of tuples. The problem is originated by a twofold reason: on one hand the exponential number of dimensional combinations to be dealt, and on the other hand the cardinality of dimensions. The larger dimension domains are, the more aggregated results there are (according to each real value combination). Unfortunately, it is widely recognized that in OLAP databases, data can be very sparse (Ross and Srivastava, 1997; Beyer and Ramakrishnan, 1999) thus scarce value combinations are likely to be numerous and, when computing entirely the Datacubes (full Datacubes), each exception must be preserved. In such a context, (1) approaches favor the efficiency of Olap queries to the detriment of storage space or (2) they favor an optimal representation of cubes but Olap query performances are likely to be debased (Morfonios et al., 2007).

Complete Chapter List

Search this Book: