Federated Data Warehouses

Federated Data Warehouses

Stefan Berger (University of Linz, Data & Knowledge Engineering Group, Austria) and Michael Schrefl (University of Linz, Data & Knowledge Engineering Group, Austria)
DOI: 10.4018/978-1-60566-748-5.ch005
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Federated data warehouses are a collection of autonomous and often heterogeneous data marts (DM). When attemping to integrate autonomous DMs, data designers commonly face numerous conflicts that must be repaired. This chapter analyzes and classifies the conflicts at the schema and instance level among dimensions and cubes in a systematic way, based on a formal data model. It shows the dependencies between dimension and cube integration and presents a methodological DM integration approach. A running example demonstrates how to repair the various heterogeneities. Moreover, the chapter introduces a federated DW reference architecture enabling tightly coupled integration of autonomous DMs.
Chapter Preview
Top

Introduction

Data Warehouses (DWs) are sophisticated, highly specialized database systems optimized for analytical workload (strategic decision making) rather than transactional data processing. An organization’s DW collects and consolidates data from disparate sources on all subject areas that are helpful for decision making. In that sense, the DW represents the “corporate memory” in which historical business data is collected and reconciled on a fine grained detail level. Typical DWs host huge amounts of data, up to several terabytes (Inmon, 2005).

Data Marts (DMs) are specific repositories, designed on top of the DW to deliver some data subset to a particular group of users, e.g. the managers of the sales division. Sometimes the detail level of a DM is coarser compared to the underlying DW data (e.g. sales by product group vs. sales transactions by customer receipt) to reduce storage requirements. Nevertheless, the amount of data in DMs is still very large. Both the DW and DMs typically conform to the multi-dimensional data model, arranging the items of interest (“measures”) as data cubes, i.e. within an analysis space having several axes (“dimensions”) that represent different business perspectives (Inmon, 2005).

DWs and OLAP systems are widely used by both public and private organizations to enable better strategic business decisions. Data analysts utilize spreadsheet based reports, visualization graphs, OLAP operations, and so forth to drill into the collected data and analyze the performance of their business processes. Traditionally, DWs have been designed as stand-alone systems that are operated by the centralized IT department of the organization.

Nowadays, medium-sized to large organizations commonly integrate their business activities by means of strategic cooperations or mergers and acquisitions. Data integration across autonomous organizations is a necessary prerequisite for any business cooperation. The traditional approach of database systems integration has been researched for several decades. (Halevy, Rajaraman, & Ordille, 2006) survey recent progress achieved by the data integration community and list future challenges. Federated Database Systems (Sheth & Larson, 1990) are a prominent example of systems applying these techniques.

Lately, due to the advent of more processing power and network capacity, the integration of data stemming from independent DWs is becoming increasingly interesting and important. The integration of autonomous DWs allows the cooperating organizations to mutually share their “corporate memories”. Successful DW integration offers exciting additional opportunities for the decision makers since it opens up a larger pool of information in all participating organizations, broadening the knowledge base.

Without the appropriate methodology and tool support, DW/DM integration is a tedious and error-prone task, though. As an illustrative example of the practical difficulties consider the conceptual multi-dimensional schema of a fictitious health insurance organization, consisting of independent sub-organizations within several Federal States governed by a federal association. For simplicity, our scenario considers only two sub-organizations, both of which autonomously operate a Data Mart, as depicted in Figure 1. The schema is instantiated at two distinct nodes, named dwh1 and dwh2.

Figure 1.

Health insurance conceptual schemas of local Data Marts

The schemas in Figure 1 are specified in the ”Dimensional Fact Model” (DFM) proposed by (Golfarelli, Maio, & Rizzi, 1998). DFM is a graphical notation for conceptual multi-dimensional models. It visualizes the facts and dimensions of data cubes as well as the dependencies within a cube. Notice that the DFM allows to “reuse” or “share” dimensions among multiple facts (e.g. date_time).

Complete Chapter List

Search this Book:
Reset