GML is a promising model for integrating geodata within data warehouses. The resulting databases are generally large and require spatial operators to be handled. Depending on the size of the target geographical data and the number and complexity of operators in a query, the processing time may quickly become prohibitive. To optimize spatial queries over GML encoded data, this chapter introduces a novel cache-based architecture. A new cache replacement policy is then proposed. It takes into account the containment properties of geographical data and predicates, and allows evicting the most irrelevant values from the cache. Experiences with the GeoCache prototype show the effectiveness of the proposed architecture with the associated replacement policy, compared to existing works.
The increasing accumulation of geographical data and the heterogeneity of Geographical Information Systems (GISs) make difficult efficient query processing in distributed GIS. Novel architectures (Zhang, 2001) (Gupta, 1999) (Leclercq, 1999) (Chen, 2000) (Paolucci, 2001) (Corocoles, 2003) (Boucelma, 2002) (Stoimenov, 2000) (Voisard, 1999) are based on XML, which becomes a standard for exchanging data between heterogeneous sources. Proposed by OpenGIS (OpenGIS, 2003), GML is an XML encoding for the modeling, transport, and storage of geographical information including both the spatial and non-spatial fragments of geographical data (called features). As stressed in (Savary, 2003), we believe that GML is a promising model for geographical data mediating and warehousing purpose.
By their nature, geographical data are large. Thus GML documents are often of important size. The processing time of geographical queries over such documents in a data warehouse can become too large for several reasons:
The query evaluator needs to parse entire documents to find and extract query relevant data.
Spatial operators are not cost effective, especially if the query contains complex selections and joins on large GML documents.
Moreover, computational costs of spatial operators are generally more expensive than those of standard relational operators. Thus, geographical queries on GML documents raise the problem of memory and CPU consumption. To solve this problem, we propose to exploit the specificities of a semantic cache (Dar, 1996) with an optimized data structure. The proposed structure aims at considerably reducing memory space by avoiding storing redundant values. Furthermore, a new cache replacement policy is proposed. It keeps in cache the most relevant data for better efficiency.
Related works generally focus on spatial data stored in object-relational databases (Beckmann, 1990). The proposed cache organizations are better suitable for tuple-oriented data structures (Brinkhoff, 2002). Most cache replacement policies are based on Least Recently Used (LRU) and its variants. Other cache replacement policies proposed in the literature (Lorenzetti, 1996) (Cao, 1997) (Arlitt, 1999) deal with relational or XML databases, but have not yet investigated the area of XML spatial databases.
The rest of the paper is organized as follows: Section 2 gives an overview of related works. Section 3 presents our cache architecture adapted for GML geographical data. Section 4 discusses about the inference rules of spatial operators and presents an efficient replacement policy for geographical data considering inference between spatial operators. Section 5 shows some results of the proposed cache implementation and replacement policy. Finally, the conclusion summarizes our contributions and points out the main advantages of the proposed GML cache-based architecture.