Built-In Indicators to Support Business Intelligence in OLAP Databases

Built-In Indicators to Support Business Intelligence in OLAP Databases

Jérôme Cubillé (EDF R&D, France), Christian Derquenne (EDF R&D, France), Sabine Goutier (EDF R&D, France), Françoise Guisnel (EDF R&D, France), Henri Klajnmic (EDF R&D, France) and Véronique Cariou (ENITIAA, France)
DOI: 10.4018/978-1-60566-748-5.ch006
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This chapter is in the scope of static and dynamic discovery-driven explorations of a data cube. It presents different methods to facilitate the whole process of data exploration. Each kind of analysis (static or dynamic) is developed for either a count measure or a quantitative measure. Both are based on the calculation, on the fly, of specific statistical built-in indicators. Firstly, a global methodology is proposed to help a dynamic discovery-driven exploration. It aims at identifying the most relevant dimensions to expand. A built-in rank on dimensions is restituted interactively, at each step of the process. Secondly, to help a static discovery-driven exploration, generalized statistical criteria are detailed to detect and highlight interesting cells within a cube slice. The cell’s degree of interest is determined by the calculation of either test-value or Chi-Square contribution. Their display is done by a color-coding system. A proof of concept implementation on the ORACLE 10g system is described at the end of the chapter.
Chapter Preview
Top

Introduction

Context

Most of large companies have identified the importance and the strategic value of the information contained in their Data Warehouses. But these Data Warehouses are of interest only if the knowledge they contain, is correctly extracted, formatted, summarized, presented and shared by business analysts in order to create added value. Different technologies exist that can organize, present and make data available, from simple request or visualization tools to simulation applications or multidimensional analysis.

In particular, On-Line Analytical Processing (OLAP) applications are powerful decision support tools. They offer a multidimensional view of the data, by calculating and displaying indicators either in detail or with some level of aggregation. Given the fact table of a cube, the multidimensional model consists in dimensional attributes (dimensions) and measures attributes (measures) defined using an aggregate function like sum, count or average.

In practice, a data cube is built from a single large table which contains the most detailed data and we assume that the structure of this underlying table (called Detail) is the following:Detail(#ID, d1, d2, …, dz, M1, M2, …, Mq, W1, W2, …, Wl)where d1, d2, …, dz are called dimension attributes and form the dimensions of the data cube, M1, M2, …, Mq are called measure attributes and form the measures, and W1, W2, …, Wl are a set of weights of each tuple. Usually one of them equals 1 and is used for the count measure. Tuples of this detailed table are called statistical units.

The measures functionally depend on the set of dimensions they are associated with. In this chapter, we consider a sales data cube example1 in order to analyze customers energy consumption per year and market segmentation according to the following relation scheme:Customer(#Cust, Tariff, Dwelling, Heating, Occupation, Water Heating, Oldness Of House, Date Of Contract Subscription, Type Of Dwelling, Consumption)

Figure 1 presents, in a simplified data cube, the mean electrical consumption (per year) and the number of customers according to three characteristics.

Figure 1.

Simplified cube representation based on the three dimensions Tariff, Contract and Energy Heating

In the underlined cell on Figure 1, 3 133 corresponds to the number of customers with Tariff 2 subscribed between 1985 and 1995 and using fuel oil for their energy heating. These customers have consumed 29 560 930 KWh per year.

Figure 3.

Application snapshot at the beginning of the process © 2009 [Françoise Guisnel] Used with permission.

In many company departments, such as Marketing, Sales or Human Resources, business analysts take advantage of OLAP products for market segmentation, customers analysis or sales forecasting activities. This multidimensional view is easily understandable by business analysts who can become direct end-users of the corresponding commercial software. Exploration of the data is performed thanks to interactive navigation operators. Moreover, OLAP management systems offer reporting and graphical representations capabilities to analyze more efficiently data. Several OLAP products, for instance ORACLE 10g, also provide a web restitution for end-users.

This fact has interesting benefits:

  • •Business analysts can directly interact with data instead of relying on computer scientists,

  • •The development cost of computer-based decision systems decreases since business analysts build interactively the reports they desire.

Nevertheless, as soon as multidimensional databases become very large, the number of cells of the cube dramatically increases. Business analysts are rapidly confronted to a difficult and tedious task in order to analyze such multidimensional databases with multiple criteria. They would like to answer such type of questions:

  • •How to quickly find which cell of the multidimensional table contains an atypical value for the measure (that is which cell contains a value very different from expected)?

  • •How to rapidly discover the dimensions which are the most correlated or associated one to another?

  • •How to automatically detect that a measure with specific selection criteria has an abnormal temporal evolution compared to the others?

Commercial OLAP products do not provide such intelligent analysis operators to help business analysts in the browsing of their data. Indeed, Data Mining methods which allow to carry out complex exploration and to discover useful information in huge amount of data are generally used separately from OLAP technologies:

  • •The OLAP systems give an efficient solution to the data cubes building and exploration, but they do not offer any statistical nor algorithmic method in order to help the analysis of the presented aggregated data,

  • •The Data Mining methods are traditionally developed on “classical” individuals-variables table or on relational databases and are not extended to multidimensional databases. Even if Data Mining algorithms are more and more integrated in relational database management systems (for instance Oracle 10g or Sql Server 2005), direct implementations are only done on detailed data and not on aggregated one.

This chapter describes methods to support business intelligence in OLAP databases based on:

  • •A dynamic analysis in order to guide end-users’ navigation. The aim is to identify the most interesting dimensions to expand, given a current slice under study. The built-in indicators, we developed, help the user by proposing him or her at each step of the process the dimension to expand first.

  • •A static analysis of a multidimensional cube, coupled with the previous one, in order to help end-users’ analysis. The aim is to automatically detect interesting cells among a user selected cells. In this chapter, interesting cells mean cells containing a measure very different from the other selected ones. We propose two different algorithms depending on the nature of the measure displayed in the cells (a count measure or a quantitative one).

The originality of this work is the adaptation of Data Mining methods to multidimensional environment where detailed data are no more available and only aggregated ones can be used. Our solution is based on a tight coupling between OLAP tools and statistical methods. Methods are performed thanks to built-in indicators computed instantaneously during exploration without any pre-computation. They constitute an add-in OLAP tool providing a support to decision making.

Complete Chapter List

Search this Book:
Reset