Designing Data Marts from XML and Relational Data Sources

Designing Data Marts from XML and Relational Data Sources

Yasser Hachaichi, Jamel Feki, Hanene Ben-Abdallah
DOI: 10.4018/978-1-60566-756-0.ch004
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Due to the international economic competition, enterprises are ever looking for efficient methods to build data marts/warehouses to analyze the large data volume in their decision making process. On the other hand, even though the relational data model is the most commonly used model, any data mart/ warehouse construction method must now deal with other data types and in particular XML documents which represent the dominant type of data exchanged between partners and retrieved from the Web. This chapter presents a data mart design method that starts from both a relational database source and XML documents compliant to a given DTD. Besides considering these two types of data structures, the originality of our method lies in its being decision maker centered, its automatic extraction of loadable data mart schemas and its genericity.
Chapter Preview
Top

Introduction

Faced with the ever increasing economic competition, today’s enterprises are hard-pressed to rely on decision support systems (DSS) to assist them in the analysis of very large data volumes. As a response to this constraint, data warehousing technologies have been proposed as a means to extract pertinent data from information systems and present it as historical snapshots used for ad hoc analytical queries and scheduled reporting. Indeed, a data warehouse (DW) is organized in such a way that relevant data is clustered together for an easy access. In addition, a DW can be used as a source for building data marts (DM) that are oriented to specific subjects of analyses.

Traditionally, the data loaded into a DW/DM is mainly issued from the enterprise’s own operational information system. Thus, most currently proposed DW/DM construction approaches suppose a single, often relational data source; cf., (List, Bruckner, Machacze, & Schiefer, 2002), (Golfarelli, Maio, & Rizz, 1998), (Cabibbo, L., & Torlone, R. 1998), (Moody , & Kortnik, 2000), (Prat, Akoka , & Comyn-Wattiau, 2006), (Zribi, & feki, 2007), (Golfarelli, Rizzi, & Vrdoljak, 2001), (Vrdoljak, Banek, & Rizzi, 2003), (Jensen, Møller, & Pedersen, 2001). However, due to the international competition, enterprises are increasingly forced to enrich their own data repository with data coming from external sources. Besides data received from partners, the web constitutes the main external data source for all enterprises. For instance, an enterprise may need to retrieve from the web data about the exchange rates in order to analyze the variation of the quantities of its sold products with respect to the exchange rates during a period of time.

To deal with such an open data source, a DW/DM construction approach must, hence, overcome the main difficulty behind the use of multiple data sources: the structural and semantic heterogeneities of the sources. In fact, even though the relational data model is the most commonly used model (Wikipedia encyclopedia, 2008), a DW construction approach must now deal with other data types and in particular XML documents which represent the dominant data type on the web. On the other hand, the semantic data heterogeneity comes into play when the internal and external data sources are complementary, e.g., the case of transactional data between partners. This type of heterogeneity remains a challenging problem that can be treated either at the data source level or the DW/DM level (Boufares, & Hamdoun, 2005).

This chapter deals with the structural data heterogeneity when designing a data mart. More precisely, it presents a DM design method that starts from both a relational database source and XML documents compliant to a given DTD. Besides considering these two types of data structures, our method has three additional advantages. First, it provides for a DSS development centered on decision makers: it assists them in defining their analytical needs by proposing all analytical subjects that could be automatically extracted from their data sources; the automatic extraction of DM schemas distinguishes our method from currently proposed ones. Secondly, it guarantees that the extracted subjects are loadable from the enterprise information system and/or the external data sources. The third advantage of our design method is its genericity: It is domain independent since it relies on the structural properties of the data sources independently of their semantics. It automatically applies a set of rules to extract, from the relational database and XML documents, all possible facts with their dimensions and hierarchies.

Complete Chapter List

Search this Book:
Reset