Efficient Compression and Storage of XML OLAP Cubes

Efficient Compression and Storage of XML OLAP Cubes

Doulkifli Boukraa, Mohammed Amin Bouchoukh, Omar Boussaid
Copyright: © 2015 |Pages: 25
DOI: 10.4018/IJDWM.2015070101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In this paper, the authors present an approach to efficiently compress XML OLAP cubes. They propose a multidimensional snowflake schema of the cube as the basic physical configuration. The cube is then composed of one XML fact document and as many XML documents as the dimension hierarchy members. The basic configuration is reorganized into two ways by adding data redundancy on purpose in order to achieve a better compression ratio on the one hand and to improve query response time on the other hand. In the second configuration, all the documents of the cube are merged into one single XML document. In the third configuration, each reference between the fact and the dimensions or between the members of a dimension hierarchy is replaced by the whole XML referenced fragments. To the three physical configurations of the cube, the authors apply a new compression technique named XCC. They demonstrate the efficiency of the third configuration before and after compression and they also show the efficiency of their compression technique when applied to XML OLAP cubes.
Article Preview
Top

Introduction

Data warehouses support business decisions by collecting, consolidating, and organizing data in a multidimensional way for reporting and analysis with tools such as Online Analytical Processing (OLAP) (Chaudhuri & Dayal, 1997) or data mining (Tjioe & Taniar, 2005). From a modeling viewpoint, a data warehouse relies on a multidimensional view of data. Multidimensional modeling is nowadays widely adopted for decision support as it dedicates data organization to meet the users' analytical needs. A traditional multidimensional model organizes data around one or more facts, described by a set of measures, which are analyzed along observation axes. A data warehouse can be implemented using two systems: ROLAP and MOLAP. In ROLAP, relational tables are used to model the fact and each dimension. The fact table is linked to each dimension using a foreign key. A MOLAP system, however, records the multidimensional data in an array composed of cells, which are accessed by dimension indices. Each cell contains a value for each measure. In the sequel, we will refer to the multidimensional data using a metaphoric representation of data cube, or cube for short. Regarding ROLAP cubes, there are two prevalent logical models that are straightforwardly mapped onto the relational model: the star schema and the snowflake schema (Kimball, 1996). In the latter, the fact is represented by a relational table and each dimension is composed of as many tables as the number of hierarchy levels. Each dimension is said to be normalized according to the relational third normal form. In a star schema, however, each dimension is de-normalized into a single table. Obviously, the snowflake schema is characterized by less data redundancy than the star schema. Nevertheless, the star schema is more efficient than the snowflake schema with regards to join queries. Regarding MOLAP cubes, the scope of data redundancy is limited to the set of measure values. However, a MOLAP cube usually shows much data sparsity. Thus, in both systems, the size of the cube tends to be huge due to redundancy in ROLAP systems and to sparsity in MOLAP ones. In this context, both ROLAP and MOLAP cubes can benefit from compression. Data compression is “the process of converting an input data stream (the source stream or the original raw data) into another data stream (the output, the bitstream, or the compressed stream) that has a smaller size” (Salomon, 2007). The benefits of compression include reducing storage, increasing the data transfer rate, enhancing data security and increasing the system performance (Bassiouni, 1985). In the context of data warehouses and cubes, compression aims at optimizing the query response time first by reducing their size, then by providing approximate answers for queries.

On the other hand, XML format has become the de facto format for data exchange over the Web. It is also a modeling meta-language. In the context of data warehouses, XML is a good choice for warehousing complex data (Boussaid, Messaoud, Choquet, & Anthoard, 2006). According to (Ravat, Teste, Tournier, & Zurfluh, 2010), there are two main approaches for XML warehousing: an XML data warehouse (XDW) and an XML document warehouse. In our work, we are interested in the former. XML data warehouses have attracted much attention with regards to many aspects, such as their construction (Rusu, Rahayu, & Taniar, 2004; Rusu, Rahayu, & Taniar, 2005). From the modeling standpoint, an XML data warehouse relies on the same concepts as a traditional data warehouse, i.e. fact, measure, dimension and hierarchy. However, due to XML data semi-structured nature, MOLAP and ROLAP systems cannot be used straightforwardly for XDWs. Therefore, there have been many attempts to define suitable models for XDWs at the conceptual, logical and physical levels.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing