Basically, the schema of a data warehouse lies on two kinds of elements: facts and dimensions. Facts are used to memorize measures about situations or events. Dimensions are used to analyse these measures, particularly through aggregation operations (counting, summation, average, etc.). To fix the ideas let us consider the analysis of the sales in a shop according to the product type and to the month in the year. Each sale of a product is a fact. One can characterize it by a quantity. One can calculate an aggregation function on the quantities of several facts. For example, one can make the sum of quantities sold for the product type “mineral water” during January in 2001, 2002 and 2003. Product type is a criterion of the dimension Product. Month and Year are criteria of the dimension Time. A quantity is so connected both with a type of product and with a month of one year. This type of connection concerns the organization of facts with regard to dimensions. On the other hand a month is connected to one year. This type of connection concerns the organization of criteria within a dimension. The possibilities of fact analysis depend on these two forms of connection and on the schema of the warehouse. This schema is chosen by the designer in accordance with the users needs. Determining the schema of a data warehouse cannot be achieved without adequate modelling of dimensions and facts. In this article we present a general model for dimensions and facts and their relationships. This model will facilitate greatly the choice of the schema and its manipulation by the users.
Concerning the modelling of dimensions, the objective is to find an organization which corresponds to the analysis operations and which provides strict control over the aggregation operations. In particular it is important to avoid double-counting or summation of non-additive data. Many studies have been devoted to this problem. Most recommend organizing the criteria (we said also members) of a given dimension into hierarchies with which the aggregation paths can be explicitly defined. In (Pourabbas, 1999), hierarchies are defined by means of a containment function. In (Lehner, 1998), the organization of a dimension results from the functional dependences which exist between its members, and a multi-dimensional normal form is defined. In (Hùsemann, 2000), the functional dependences are also used to design the dimensions and to relate facts to dimensions. In (Abello, 2001), relationships between levels in a hierarchy are apprehended through the Part-Whole semantics. In (Tsois, 2001), dimensions are organized around the notion of a dimension path which is a set of drilling relationships. The model is centered on a parent-child (one to many) relationship type. A drilling relationship describes how the members of a children level can be grouped into sets that correspond to members of the parent level. In (Vassiliadis, 2000), a dimension is viewed as a lattice and two functions “anc” and “desc” are used to perform the roll up and the drill down operations. Pedersen (1999) proposes an extended multidimensional data model which is also based on a lattice structure, and which provides non-strict hierarchies (i.e. too many relationships between the different levels in a dimension).