Business Information Integration from XML and Relational Databases Sources

Business Information Integration from XML and Relational Databases Sources

Ana María Fermoso Garcia (Pontifical University of Salamanca, Spain)
Copyright: © 2009 |Pages: 21
DOI: 10.4018/978-1-60566-098-1.ch020

Abstract

This chapter introduces different alternatives to store and manage jointly relational and eXtensible Markup Language (XML) data sources. Nowadays, businesses are transformed in e-business and have to manage large data volumes and from heterogeneous sources. To manage large amounts of information, Database Management Systems (DBMS) continue to be one of the most used tools, and the most extended model is the relational one. On the other side, XML has reached the de facto standard to present and exchange information between businesses on the Web. Therefore, it could be necessary to use tools as mediators to integrate these two different data to a common format like XML, since it is the main data format on the Web. First, a classification of the main tools and systems where this problem is handled is made, with their advantages and disadvantages. The objective will be to propose a new system to solve the integration business information problem.
Chapter Preview
Top

Introduction

Data warehouse technology introduced in the early 90s to support data analysis in business environments has recently reached out to non-business domains such as medicine, education, research, government, etc. End-users interact with data using advanced visual interfaces that enable intuitive navigation to the desired data subset and granularity and provide a visually enhanced presentation using a variety of visualization techniques.

Data warehouse systems adopt a multidimensional data model tackling the challenges of the online analytical processing (OLAP) (Codd, Codd, & Salley, 1993) via efficient execution of queries that aggregate over large amount of detailed data. The analysis is preceded by a highly complex ETL (extract, transform, load) process of integrating the data from multiple systems and bringing it into a consistent state.

In relational OLAP systems, multidimensional views of data, or data cubes, are structured using a star or a snowflake schema consisting of fact tables and dimension hierarchies. Fact tables contain data records (facts) such as transactions or events, which represent the focus of the analysis. Facts are composed of two types of attributes: (1) measures (i.e., the actual elements of the analysis), and 2) dimensions, which uniquely determine the measures and serve as exploration axes for aggregation. Members of a dimension are typically organized in a containment type hierarchy to support multiple granularities. In the dimension table, the attributes that form the hierarchy are called dimension levels, or categories. Other descriptive attributes belonging to a particular category are known as property attributes. Dimension levels along with parent/child relationships between them are referred to as the dimension’s intension, or schema, whereas the hierarchy of its members forms its extension.

Figure 1 shows the star schema view of a data cube storing the administrative expenditures of a university: the facts in the fact table ORDER are determined by five dimensions. In the star schema, the whole dimension hierarchy is placed into a single table, whereas the snowflake schema enforces the hierarchy to be decomposed into separate tables, one table per dimension level.

Figure 1.

Star schema view of data

The two logical design options are illustrated in Figure 2 at the example of the dimension Period. The star schema produces a single table period with all dimension levels and property attributes. Obviously, in such denormalized view it is impossible to explicitly recognize the hierarchical relationships. In the snowflake schema, however, each dimension category with its property attributes is placed into a separate table referencing its parent. The arrows correspond to the foreign keys (i.e., the roll-up relationships between the levels). The resulting schema is rather complex, but it offers the advantage of automatic extraction of the hierarchy schema with all valid aggregation paths from the foreign key constraints. Notice that reoccurring intervals such as weeks, months, quarters, etc. are presented by a two-category lattice (e.g., months → month) in order to be able to roll-up single instances to the instance’s type. For example, months instances “January 1997” and “January 1998” rollup to month instance “January.”

Figure 2.

Snowflake schema (left) vs. star schema (right) of a time hierarchy

Complete Chapter List

Search this Book:
Reset