One of the major challenges facing a data warehouse is to improve the query response time while keeping the maintenance cost to a minimum. Recent solutions to tackle this problem suggest to selectively materialize certain views and compute the remaining views on-the-fly, so that the cost is optimized. Unfortunately, in the case of a spatial data warehouse, both the view materialization cost and the onthe- fly computation cost are often extremely high. This is due to the fact that spatial data are larger in size and spatial operations are more complex than the traditional relational operations. In this chapter, the authors propose a new notion, called preview, for which both the materialization and on-the-fly costs are significantly smaller than those of the traditional views. Essentially, to achieve these cost savings, a preview pre-processes the non-spatial part of the query, and maintains pointers to the spatial data. In addition, it exploits the hierarchical relationships among the different views by maintaining a universal composite lattice, and mapping each view onto it. The authors present a cost model to optimally decompose a spatial query into three components, the preview part, the materialized view part and the on-the-fly computation part, so that the total cost is minimized. They demonstrate the cost savings with realistic query scenarios, and implement their method to show the optimal cost savings.
Top1 Introduction
One of the major challenges facing a data warehouse is to improve the query response time while keeping the maintenance cost to a minimum. Recently, selectively materializing certain views over source relations has become the philosophy in designing a data warehouse. While materialized views incur the space cost and view maintenance cost, views that are not materialized incur on-the-fly computation cost. One has to balance both these costs in order to materialize the optimal views that incur minimum cost. This problem is exasperated when we consider a spatial data warehouse (SDW). This is because, spatial data are typically large in size (e.g., point, line, region, raster and vector images), and the operations on spatial data are more expensive (e.g., region merge, spatial overlay and spatial range selection). As a result, often, both on-the-fly computation cost and the view materialization cost are prohibitively expensive.
In this paper, we take a novel approach to resolve this issue. In particular, we introduce an intermediary view, called preview, for which both the materialization and on-the-fly costs are significantly smaller than those of the traditional views. Essentially, the idea of a preview is to pre-process the non-spatial part of the query and materialize this part, but leave the spatial part for the on-the-fly computation and maintain pointers to the spatial data on which the spatial operation should be performed. In addition, a preview also exploits the hierarchical relationships among different views.
The process of computing a preview is as follows. We first capture the relationships among the general and specific concepts of each dimension in the data warehouse, which, in general can form a partial order. By considering all the dimensions together, one can construct a universal composite lattice (UCL). Basically, every query that can be posed to the data warehouse will be represented in the UCL. Then we decompose each query into a set of atomic spatial queries. An atomic spatial query contains only one spatial operation, but may contain many non-spatial operations. UCL is instantiated by mapping all the atomic spatial queries, which could be either materialized, computed on-the-fly or built for a preview based on certain cost conditions, such that their inter-dependent relationships are retrieved. If we decide to build a preview for a specified atomic spatial query, we extract the spatial projection of this atomic spatial query, which is nothing but the part containing the spatial operation. A preview for a spatial atomic query contains: its spatial projection, pointers to the spatial objects on which the spatial operation is performed, materialized non-spatial part, and pointers to the lower level previews or materialized views in the UCL. Essentially, by materializing previews, we are in some sense pre-executing the non-spatial part of the query and compute its spatial projection. As a result, the on-the-fly computation part of a preview or an atomic query computed on-the-fly could be done more efficiently from the existing lower level views than starting from the base tables.
Obviously, storing previews in a data warehouse introduces overhead because it requires additional storage and process efforts to maintain the data sets during updates. However, in this paper, we demonstrate that, the performance gain achieved through preview more than offsets this storage and maintenance overhead. Our ultimate goal is to optimize the total cost of a spatial data warehouse, which is the sum of the space cost of materialized views, the online computation cost of queries if not materialized, and the online computation and space cost of previews, if any.
Generally, there are two categories of modes of processing a spatial query: (1) display-mode, where the spatial objects are retrieved and simply displayed, but no real spatial operations are performed on them, and (2) operation-mode, in which certain spatial operations are performed to produce the result specified in the query. For example, retrieving the existing population maps within certain areas and certain time frames is a display-mode query, but finding the boundary of these maps is an operation-mode because there is a spatial operation boundary(maps) involved. We distinguish these two modes because the cost associated with each mode varies significantly, even within each mode, the cost varies depending on the specific operation. For a specific query, if preview cannot optimize its total cost, we either materialize it or compute it on-the-fly. Therefore, we need to select an appropriate subset of spatial queries for preview construction in order to minimize the cost of the whole spatial data warehouse.