Deploying Data Warehouses in Grids with Efficiency and Availability

Deploying Data Warehouses in Grids with Efficiency and Availability

Rogério Luís de Carvalho Costa (University of Coimbra, Portugal) and Pedro Furtado (University of Coimbra, Portugal)
DOI: 10.4018/978-1-60566-748-5.ch009
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Many global organizations are generating huge volumes of data, which are stored in highly distributed databases. These databases can be put together through a Grid infrastructure in order to form a large virtual data warehouse, which is physically distributed but can be transparently queried by Grid participants. But in Grids the system environment is heterogeneous and resource availability may vary over time, which may lead to performance degradation and data unavailability. In this chapter, the authors present Grid-NPDW which uses specialized data placement and job scheduling strategies in order to construct a Grid-based data warehouse with high performance and availability.
Chapter Preview
Top

From Parallel To Grid-Based Data Warehouses

In this section we review concepts and previous work on parallel and distributed data warehouses, and also on grid-based resource management systems, which are central concepts to our grid-enabled data warehouse approach.

Data warehouses (DW) are huge databases which store historical data and are mainly used for decision support purposes. They are commonly organized as star schemas (Chaudhuri & Dayal, 1997), which means they have one or more large facts tables and some smaller dimension tables. A sample star schema is represented in Figure 1. In such example, table Revenue is a fact table and the other relations are dimension tables linked to the fact table by foreign keys.

Figure 1.

Sample Star Schema Model

Complete Chapter List

Search this Book:
Reset