View Management Techniques and Their Application to Data Stream Management

View Management Techniques and Their Application to Data Stream Management

Christoph Quix, Xiang Li, David Kensche, Sandra Geisler
DOI: 10.4018/978-1-60566-816-1.ch005
OnDemand:
(Individual Chapters)
Available
$33.75
List Price: $37.50
10% Discount:-$3.75
TOTAL SAVINGS: $3.75

Abstract

Data streams are continuous, rapid, time-varying, and transient streams of data and provide new opportunities for analysis of timely information. Data processing in data streams faces similar challenges as view management in data warehousing: continuous query processing is related to view maintenance in data warehousing, multi-query optimization for continuous queries is highly related to view selection in conventional relational DBMS and data warehouses. In this chapter, we give an overview of view maintenance and view selection methods, explain the fundamental issues of data stream management, and discuss how view management techniques from data warehousing are related to data stream management. We also give directions for future research in view management, data streams, and data warehousing.
Chapter Preview
Top

Introduction

The management of views is a fundamental problem in the design and maintenance of data warehouse systems. Materialized views speed up query processing, but require additional storage and need to be maintained in case of updates of the base data. In order to balance the efficiency of query processing and view maintenance, view selection techniques have been proposed which select a set of views that approximates optimal costs for query processing and view maintenance.

Data warehouses rely heavily on analysis of up-to-date information to support decision makers. The advent of a new class of data management applications, namely data stream management systems (DSMS), provides new opportunities for analysis of timely information. A data stream is a continuous, rapid, time-varying, and transient stream of data. There are connections between DSMS and view management. Whereas continuous query processing is related to view maintenance in data warehousing, multi-query optimization for continuous queries is highly related to view selection in conventional relational DBMS and data warehouses. In this chapter, we give an overview of view maintenance and view selection methods, explain the fundamental issues of data stream management, and discuss how view management techniques from data warehousing are related to data stream management.

The chapter is structured as follows: section 2 briefly explains the roles of views in data warehouses. Section 3 gives an overview of view maintenance methods and classifies them according to various criteria. Then, section 4 explains the view selection problem and presents a taxonomy of existing view selection techniques. Section 5 discusses issues and challenges in data stream management and summarizes recent results in research on data streams. Section 6 discusses the relationship of view management techniques to data stream management. Similarities, differences and possible connections between data stream management and view management are discussed. Finally, section 7 summarizes the chapter and points out directions for future research in view management, data streams, and data warehousing.

Views in Data Warehousing

A view can select or restructure data in such a way that an application can use the data more efficiently. Different from On-Line Transaction Processing (OLTP) systems, which focus at managing the common data operations, data warehouses aim at supporting data analysis (i.e., On-Line Analytical Processing, OLAP) and are known for their vast volume of data and complexity of queries. The response time of queries, if evaluated from base tables, is usually too long for users to tolerate as a huge amount of data has to be processed. Therefore, it is a common practice to pre-compute summaries of base tables in order to reduce the query response time. The following example illustrates the benefit of materializing views:

Example 1 Consider the TPC-D benchmark (Serlin, 1993), modeling a data cube of sales with three dimensions: part, supplier, and customer. We denote the base table as R(part; supp; cust; sales). The following query is posed by users:

Q: SELECT part, SUM(sales) AS total

FROM R

GROUP BY part;

The following two materialized views can both benefit Q:

V1: SELECT part, cust, SUM(sales) AS total

FROM R

GROUP BY part, cust;

V2: SELECT part, supp, SUM(sales) AS total

FROM R

GROUP BY part, supp;

It depends on the statistics of the data to decide which view is better in terms of query response or storage cost. For instance, the statistics of the TPC-D database are as follows:

  • R: 6M rows

  • V1: 6M rows

  • V2: 0.8M rows

It is easy to see that materializing V2 will benefit answering Q, because V2 is much smaller to scan than the base table. Meanwhile, V1 is not quite useful since it has a comparable size to the base table.

Nonetheless, materialization of views comes at some price. On the one hand, materializing views takes up storage. On the other hand, once a view is materialized, we have to take care of the maintenance problem (cf. Section 3). Materializing views can benefit query response time at a cost of increasing storage and maintenance overhead. It is interesting to note that query response time degrades with too much extra view materialization. Kotidis (2002) attributes the phenomenon to the competition of memory buffers among materialized views.

Complete Chapter List

Search this Book:
Reset