Data warehouses (DWs) integrate data from different source systems in order to provide historical information that supports the decision-making process. The design of a DW is a complex and costly task since the inclusion of different data items in a DW depends on both users’ needs and data availability in source systems. Currently, there is still a lack of a methodological framework that guides developers through the different stages of the DW design process. On the one hand, there are several proposals that informally describe the phases used for developing DWs based on the authors’ experience in building such systems (Inmon, 2002; Kimball, Reeves, Ross, & Thornthwaite, 1998). On the other hand, the scientific community proposes a variety of approaches for developing DWs, discussed in the next section. Nevertheless, they either include features that are meant for the specific conceptual model used by the authors, or they are very complex. This situation has occurred since the need to build DW systems that fulfill user expectations was ahead of methodological and formal approaches for DW development, just like the one we had for operational databases.
The requirements specification phase is one of the earliest steps in system development and it has a major impact on the success of DW projects (Winter & Strauch, 2003). This phase will help to identify the essential elements of a multidimensional schema, i.e., facts with associated measures, dimensions, and hierarchies, required to facilitate future data manipulations and calculations. These elements should be clearly and concisely represented in a conceptual schema in a later stage. This schema will serve as basis for analysis tasks performed by the users and will be used by the designers during future evolutions of the DW.
There are different approaches for requirements specification and conceptual modeling of DWs. The so-called user-driven approach1 takes into account the fact that users play a fundamental role during requirement analysis and must get actively involved in the elucidation of relevant facts and dimensions (Freitas, Laender, & Campos, 2002; Luján-Mora & Trujillo, 2003). The business-driven approach2 bases derivation of DW structures on the analysis of either business requirements or business processes (Giorgini, Rizzi, & Garzetti, 2005; List, Schiefer, & Min Tjoa, 2000). On the other hand, the source-driven approach3 analyzes the underlying source systems in order to obtain the DW schema (Böehnlein & Ulbrich-vom Ende, 1999; Cabibbo & Torlone, 1998; Golfarelli & Rizzi, 1998; Moody & Kortink, 2000). Finally, the combined approach4 puts together the business- or user- driven and data-driven approaches representing what the business or user demands are and what the source systems can provide (Bonifati, Cattaneo, Ceri, Fuggetta, & Paraboschi, 2001; Winter & Strauch, 2003).
Key Terms in this Chapter
Analysis/Source-Driven Design: Approach for designing a DW that combines analysis-driven and source-driven approaches.
Source-Driven Design: Approach for designing a DW based on the analysis of data available in source systems.
Business Metadata: The information about the meaning of data as well as business rules and constrains that should be applied to data.
Source Systems: Systems that contain data to feed a DW. This may include operational databases and other internal or external systems.
Technical Metadata: The information about data structures and storage as well as applications and processes that manipulate the data.
Analysis-Driven Design: Approach for designing a DW based on the analysis of user requirements or business processes.
Conceptual Multidimensional Model: A set of objects and rules for representing in an abstract way multidimensional view of data consisting of facts with measures and dimensions with hierarchies.