Materialized View Selection for Data Warehouse Design
Dimitri Theodoratos (New Jersey Institute of Technology, USA), Wugang Xu (New Jersey Institute of Technology, USA) and Alkis Simitsis (National Technical University of Athens, Greece)
Copyright: © 2009
A Data Warehouse (DW) is a repository of information retrieved from multiple, possibly heterogeneous, autonomous, distributed databases and other information sources for the purpose of complex querying, analysis and decision support. Data in the DW are selectively collected from the sources, processed in order to resolve inconsistencies, and integrated in advance (at design time) before data loading. DW data are usually organized multidimensionally to support On-Line Analytical Processing (OLAP). A DW can be abstractly seen as a set of materialized views defined over the source relations. During the initial design of a DW, the DW designer faces the problem of deciding which views to materialize in the DW. This problem has been addressed in the literature for different classes of queries and views and with different design goals.
Figure 1 shows a simplified DW architecture. The DW contains a set of materialized views. The users address their queries to the DW. The materialized views are used partially or completely for evaluating the user queries. This is achieved through partial or complete rewritings of the queries using the materialized views.
A simplified DW architecture
When the source relations change, the materialized views need to be updated. The materialized views are usually maintained using an incremental strategy. With such a strategy, the changes to the source relations are propagated to the DW. The changes to the materialized views are computed using the changes of the source relations, and are eventually applied to the materialized views. The expressions used to compute the view changes involve the changes of the source relations, and are called maintenance expressions. Maintenance expressions are issued by the DW against the data sources and the answers are sent back to the DW. When the source relation changes affect more than one materialized view, multiple maintenance expressions need to be evaluated. The techniques of multiquery optimization can be used to detect ``common subexpressions’’ among maintenance expressions in order to derive an efficient global evaluation plan for all the maintenance expressions.Top
Main Thrust Of The Chapter
When selecting views to materialize in a DW, one attempts to satisfy one or more design goals. A design goal is either the minimization of a cost function or a constraint. A constraint can be classified as user oriented or system oriented. Attempting to satisfy the constraints can result in no feasible solution to the view selection problem. The design goals determine the design of the algorithms that select views to materialize from the space of alternative view sets.