Article Preview
Top1. Introduction
Data warehousing, combined with OLAP (Online Analytical Processing) technologies, provides an innovative support for business intelligence (Codd et al., 1994). It has recently become a topic of great interest in the business world as well as in the research community. Typical OLAP applications concern marketing and organization monitoring (Kimball & Ross, 2002). However, new fields of application have been studied such as medical diagnosis (Bentayeb et al., 2010; Arigon et al., 2007) and web data and environmental data monitoring (Boussaid et al., 2008; Bimonte et al., 2005). The success of these technologies is due to several factors. First, they provide versioning and centralization of huge amounts of heterogeneous historical data. Second, decision-makers can easily explore, analyze and compare these data with user-friendly tools. These systems provide OLAP operators (which aggregate and select data) by manipulating pivot tables and graphic displays.
On the other hand, environmental dynamics modeling is extensively used to study complex phenomena and scenarios, such as urbanization, climate change and deforestation. This modeling allows researchers and stakeholders to understand and predict the evolution of the environment in response to changes in a large number of influence indicators (i.e., input data).
These models are complex because they contain several sub-models in interaction. They often involve coupled models (biological, meteorological and/or hydrological models) or individual-based models, which explicitly represent individuals of a given population (plants, animals or human). They can output a huge volume of data because they include such a high number of objects and variables, especially when simulations run for long periods of time. Moreover, many of these models include stochastic processes, which require making several replications of each simulation to get representative results. This again increases the quantity of result data for storage and analysis. One way to deal with this situation is to use statistical methods of experimental design to define the most efficient settings for simulation experiments (Saltelli, 2000 or Kleijinen, 2008 for review). Another possibility is to exploit large computing architectures (grid architectures or distributed computation infrastructures) (Chuffart et al., 2008).
To manage huge amounts of simulation result data, scientists or “model users” need to extract certain condensed data, such as typical regularities or synthetic indicators. In addition, researchers and stakeholders need to perform comparative analyses of results issued from different models and their inputs. For that purpose we need tools to extract and construct regularities and indicators as well as to analyze and compare models.
The modelers often need to compare results issued from different models or results produced by different input sets. These needs are important in the context of environmental data and models. These models can be very different, as they are created by various scientists. They can also be based on different modeling paradigms (e.g., partial differential equation, discrete event, cellular automata, and other complex modeling frameworks). Data used or produced by these models can thus be very diverse. For example, some are continuous, whereas others are discrete; thus, a storage system taking into account of all these heterogeneities should be highly flexible.