Article Preview
TopIntroduction
A corporate data warehouse is a repository that provides decision makers with a large amount of historical data concerning the overall enterprise strategy. A data-warehousing architecture defines a set of data repositories and their relationships to support the decision-making process in a given organization. Several architectural options (Cabibbo & Torlone, 2001; Jarke et al., 1999; Jukic, 2006; Samos et al., 1998 ; Watson et al., 2001) and methodologies (Bonifati et al., 2001; Giorgini et al., 2008; Luján-Mora & Trujillo, 2006a; Mazón et al., 2007a; Sen & Sinha, 2005) have been proposed to develop these repositories. Specifically, two foundational data-warehousing alternatives have been broadly discussed (Breslin, 2004): the top-down approach originally stated by Inmon (2005) and the bottom-up approach stated by Kimball and Ross (2002). The basis of these approaches consists of which data repositories should be developed first: a corporate data warehouse in which an organization's data are stored and integrated in a single repository (top-down) or departmental data marts in which data are aggregated and customized for particular information needs (bottom-up). Although the former is considered to be the most elegant solution from a theoretical point of view, it is usually hard to implement since the project scope involves the whole organization (Watson et al, 2001), and the second approach is thus more suitable for agile developments despite the problems that arise during data-mart integration (Watson et al., 2001; Chaudhuri & Dayal, 1997). Both approaches fail when they attempt to derive the second data repositories (i.e., data marts or corporate data warehouse, respectively) due to the inherent high cost associated to the integration of huge amongs of data (top-down) and to the duplicated integration tasks done by data marts (bottom-up). In order to overcome these limitations, Kimball and Ross (2002) have also proposed a bus architecture articulated by conformed dimensions. These dimensions account for 90 percent of the integration efforts made in order to tie data marts together (Kimball & Ross, 2002). They are obtained through the agreement of the entire organization, thus supporting truly cross-departmental decision-making processes. Despite all this, this solution is designed at the logical level (i.e., by using relational schemata), and does not therefore provide suitable mechanisms to drive complex developments such as methodologies (Bonifati et al., 2001; Giorgini et al., 2008; Luján-Mora & Trujillo, 2006; Mazón et al., 2006; Mazón & Trujillo, 2008) based on conceptual modeling (Abelló et al., 2006; Golfarelli et al., 1998; Hüsemann et al., 2000; Luján-Mora et al., 2006). Furthermore, existing matching methods do not cover the particular problems of integrating data warehouse and data mart schemas (Evermann, 2008).
However, we believe that the surrounding architectural debate (Breslin, 2004) has been overlooked by the current development approaches which are mainly based on conceptual modelling. These approaches have focused on capturing information requirements by means of multidimensional modelling (Kimball & Ross, 2002; Chaudhuri & Dayal, 1997) which organizes data in terms of facts and dimensions of analysis, but does not specify how data repositories (i.e., corporate data warehouse and their dependent data marts) are built from them. For instance, departmental data marts may be built by different development teams in isolation. They therefore lack incorporated conformity issues to solve the integrated development of data marts and corporate data warehouses, in order to assure cross-departmental information needs such as those answered by drill-across operations during “on-line analytical processing” (OLAP) (Chaudhuri & Dayal, 1997).