A Multidimensional Model for Data Warehouses of Simulation Results

A Multidimensional Model for Data Warehouses of Simulation Results

Hadj Mahboubi, Thierry Faure, Sandro Bimonte, Guillaume Deffuant, Jean-Pierre Chanet, François Pinet
DOI: 10.4018/978-1-4666-0333-2.ch001
(Individual Chapters)
No Current Special Offers


This paper examines the multidimensional modeling of a data warehouse for simulation results. Environmental dynamics modeling is used to study complex scenarios like urbanization, climate change and deforestation while allowing decision makers to understand and predict the evolution of the environment in response to potential value changes in a large number of influence variables. In this context, exploring simulation models produces a huge volume of data, which must often be studied extensively at different levels of aggregation due to there being a great need to define tools and methodologies specifically adapted for the storage and analysis of such complex data. Data warehousing systems provide technologies for managing simulation results from different sources. Moreover, OLAP technologies allow one to analyze and compare these results and their corresponding models. In this paper, the authors propose a generic multidimensional schema to analyze the results of a simulation model, which can guide modelers in designing specific data warehouses, and an adaptation of an OLAP client tool to provide an adequate visualization of data. As an example, a data warehouse for the analysis of results produced from a savanna simulation model is implemented using a Relational OLAP architecture.
Chapter Preview

1. Introduction

Data warehousing, combined with OLAP (Online Analytical Processing) technologies, provides an innovative support for business intelligence (Codd et al., 1994). It has recently become a topic of great interest in the business world as well as in the research community. Typical OLAP applications concern marketing and organization monitoring (Kimball & Ross, 2002). However, new fields of application have been studied such as medical diagnosis (Bentayeb et al., 2010; Arigon et al., 2007) and web data and environmental data monitoring (Boussaid et al., 2008; Bimonte et al., 2005). The success of these technologies is due to several factors. First, they provide versioning and centralization of huge amounts of heterogeneous historical data. Second, decision-makers can easily explore, analyze and compare these data with user-friendly tools. These systems provide OLAP operators (which aggregate and select data) by manipulating pivot tables and graphic displays.

On the other hand, environmental dynamics modeling is extensively used to study complex phenomena and scenarios, such as urbanization, climate change and deforestation. This modeling allows researchers and stakeholders to understand and predict the evolution of the environment in response to changes in a large number of influence indicators (i.e., input data).

These models are complex because they contain several sub-models in interaction. They often involve coupled models (biological, meteorological and/or hydrological models) or individual-based models, which explicitly represent individuals of a given population (plants, animals or human). They can output a huge volume of data because they include such a high number of objects and variables, especially when simulations run for long periods of time. Moreover, many of these models include stochastic processes, which require making several replications of each simulation to get representative results. This again increases the quantity of result data for storage and analysis. One way to deal with this situation is to use statistical methods of experimental design to define the most efficient settings for simulation experiments (Saltelli, 2000 or Kleijinen, 2008 for review). Another possibility is to exploit large computing architectures (grid architectures or distributed computation infrastructures) (Chuffart et al., 2008).

To manage huge amounts of simulation result data, scientists or “model users” need to extract certain condensed data, such as typical regularities or synthetic indicators. In addition, researchers and stakeholders need to perform comparative analyses of results issued from different models and their inputs. For that purpose we need tools to extract and construct regularities and indicators as well as to analyze and compare models.

The modelers often need to compare results issued from different models or results produced by different input sets. These needs are important in the context of environmental data and models. These models can be very different, as they are created by various scientists. They can also be based on different modeling paradigms (e.g., partial differential equation, discrete event, cellular automata, and other complex modeling frameworks). Data used or produced by these models can thus be very diverse. For example, some are continuous, whereas others are discrete; thus, a storage system taking into account of all these heterogeneities should be highly flexible.

Motivated by these needs, in this paper we propose a methodology for warehousing and analyzing simulation results. This methodology aims at including important facilities for modelers, such as the ability to estimate parameter values, to redo explorations and to produce analysis reports. We also define and provide a general multidimensional data schema (i.e., a warehouse data model) that is derived from a general-purpose conceptual data warehouse schema. Our work takes into account the specificities of simulation results (data type heterogeneity) and can guide modelers in defining their warehouse schemas according to their analytic needs. We also propose an extension of an OLAP client to allow decision-makers to analyze, explore, validate and compare results produced from simulation models.

Complete Chapter List

Search this Book: