Web Data Warehousing Convergence: From Schematic to Systematic

Web Data Warehousing Convergence: From Schematic to Systematic

D. Xuan Le (La Trobe University, Australia), J. Wenny Rahayu (La Trobe University, Australia) and David Taniar (Monash University, Australia)
Copyright: © 2009 |Pages: 16
DOI: 10.4018/978-1-60566-098-1.ch008
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This article proposes a data warehouse integration technique that combines data and documents from different underlying documents and database design approaches. The well-defined and structured data such as relational, object-oriented and object relational data, semi-structured data such as XML, and unstructured data such as HTML documents are integrated into a Web data warehouse system. The user specified requirements and data sources are combined to assist with the definitions of the hierarchical structures, which serve specific requirements and represent a certain type of data semantics using object-oriented features including inheritance, aggregation, association, and collection. A conceptual integrated data warehouse model is then specified based on a combination of user requirements and data source structure, which creates the need for a logical integrated data warehouse model. A case study is then developed into a prototype in a Web-based environment that enables the evaluation. The evaluation of the proposed integration Web data warehouse methodology includes the verification of correctness of the integrated data, and the overall benefits of utilizing this proposed integration technique.
Chapter Preview
Top

Background

In this section, we review some basic terms and concepts of database theory, especially those related to the area of query languages (see, for example, Abiteboul, Hull, and Vianu (1995) for a more comprehensive introduction). Given a database D, we distinguish the (database) schema S and the (database) instance I, representing the structure and the actual contents of the data stored in D. For example, in a relational database the schema is constituted by the relation (table) names, along with the corresponding attribute names (and possibly types), while the instance is constituted by the sets of tuples (records) having the structures specified by the single schemas.

A query is a syntactic object, typically constituted by a text string, a graph, a combination of shapes and icons, etc., constructed using elements of the schema S (the input schema), some specific symbols and according to the rules (the syntax) of a language. The query also describes the output schema (i.e., the structure of the data that will be produced by the query’s interpretation). In visual queries, the elements of the input and output schema, the operations to combine them, and the constraints that must be satisfied by the required data are represented by visual metaphors, organized according to a visual syntax.

The interpretation of a query according to a predefined semantics determines a query mapping (see Figure 1) i.e., a function from the set of possible input instances (on the input schema) of a given database to the set of possible output instances (on the output schema). Given a certain input instance, the construction of the corresponding output instance is commonly known as query evaluation. In DBMSs, efficient query evaluation is obtained using a specifically designed component called the optimizer. Query optimization is often achieved by transforming the original query into another equivalent query, possibly expressed in a more suitable language. This is very common in VQLs where the original visual query is first transformed into a standard textual query (e.g., in SQL, datalog, XQuery) and then evaluated using a conventional text-based query engine.

Figure 1.

Query and query mapping

Complete Chapter List

Search this Book:
Reset