Selecting and Allocating Cubes in Multi-Node OLAP Systems: An Evolutionary Approach

Selecting and Allocating Cubes in Multi-Node OLAP Systems: An Evolutionary Approach

Jorge Loureiro (Instituto Politécnico de Viseu, Portugal) and Orlando Belo (Universidade do Minho, Portugal)
DOI: 10.4018/978-1-60566-232-9.ch006
OnDemand PDF Download:
$37.50

Abstract

OLAP queries are characterized by short answering times. Materialized cube views, a pre-aggregation and storage of group-by values, are one of the possible answers to that condition. However, if all possible views were computed and stored, the amount of necessary materializing time and storage space would be huge. Selecting the most beneficial set, based on the profile of the queries and observing some constraints as materializing space and maintenance time, a problem denoted as cube views selection problem, is the condition for an effective OLAP system, with a variety of solutions for centralized approaches. When a distributed OLAP architecture is considered, the problem gets bigger, as we must deal with another dimension—space. Besides the problem of the selection of multidimensional structures, there’s now a node allocation one; both are a condition for performance. This chapter focuses on distributed OLAP systems, recently introduced, proposing evolutionary algorithms for the selection and allocation of the distributed OLAP Cube, using a distributed linear cost model. This model uses an extended aggregation lattice as framework to capture the distributed semantics, and introduces processing nodes’ power and real communication costs parameters, allowing the estimation of query and maintenance costs in time units. Moreover, as we have an OLAP environment, whit several nodes, we will have parallel processing and then, the evaluation of the fitness of evolutionary solutions is based on cost estimation algorithms that simulate the execution of parallel tasks, using time units as cost metric.
Chapter Preview
Top

The proper allocation of data cubes was heavily investigated for the centralized approach, where many algorithms were proposed, using mainly two kinds of techniques: greedy heuristics and genetic algorithms. The distributed approaches only recently came to stage.

All solutions have in common: 1) a cost model that allows the estimation of query (and maintenance) costs; 2) an algorithm that tries to minimize the costs: query costs or query and maintenance costs; and 3) constraints that may be applied to the optimizing process: a) maximal materializing space, and b) maximal maintenance window size that corresponds to an imposed limit of maintenance costs.

Complete Chapter List

Search this Book:
Reset