Core Methodologies in Data Warehouse Design and Development

Core Methodologies in Data Warehouse Design and Development

James Yao (Department of Information & Operations Management, Montclair State University, Montclair, NJ, USA), John Wang (Department of Information & Operations Management, Montclair State University, Montclair, NJ, USA), Qiyang Chen (Department of Information & Operations Management, Montclair State University, Montclair, NJ, USA) and Ruben Xing (Department of Information & Operations Management, Montclair State University, Montclair, NJ, USA)
Copyright: © 2013 |Pages: 10
DOI: 10.4018/ijrat.2013010104
OnDemand PDF Download:
$37.50

Abstract

Data warehouse is a system which can integrate heterogeneous data sources to support the decision making process. Data warehouse design is a lengthy, time-consuming, and costly process. There has been a high failure in data warehouse development projects. Thus how to design and develop a data warehouse have become important issues for information systems designers and developers. This paper reviews and discusses some of the core data warehouse design and development methodologies in information system development. The paper presents in particular the most recent and much heated hybrid approach which is a combination of data-driven and requirement-driven approaches.
Article Preview

Introduction

Information systems were developed in early 1960s to process orders, billings, inventory controls, payrolls, and accounts payables. Soon information systems research began. Harry Stern started the “Information Systems in Management Science” column in Management Science journal to provide a forum for discussion beyond just research papers (Banker & Kauffman, 2004). Ackoff (1967) led the earliest research on management information systems for decision-making purposes and published it in Management Science. Gorry and Scott Morton (1971) first used the term ‘decision support systems’ (DSS) in a paper and constructed a framework for improving management information systems. The topics on information systems and DSS research diversifies. One of the major topics has been on how to get systems design right.

In late 1970s, the growing success of database management systems (DBMSs) proliferated the use of databases in organizations around the world (Takecian et al., 2013). These kinds of databases are designed to handle routine business transactions. Meanwhile, with the growth of the data collected and stored, the need to analyze the data for managerial decision making increased and the drive to optimize the transactional database for analytical purpose intensified. In the early 1990s, data warehouse was coined and later developed. As an active component of DSS, which is part of today’s business intelligence systems, data warehousing became one of the most important developments in the information systems field during the mid-to-late 1990s. Since business environment has become more global, competitive, complex, and volatile, customer relationship management (CRM) and e-commerce initiatives are creating requirements for large, integrated data repositories and advanced analytical capabilities. Data warehouse is a system which can integrate heterogeneous data sources to support the decision making process (Vela, et al., 2013). By using a data warehouse, companies can make decisions about customer-specific strategies such as customer profiling, customer segmentation, and cross-selling analysis (Cunningham et al., 2006). Thus how to design and develop a data warehouse have become important issues for information systems designers and developers.

Data modeling for a data warehouse is different from operational database data modeling. An operational system, e.g., online transaction processing (OLTP), is a system that is used to run a business in real time, based on current data. An OLTP system usually adopts entity-relationship (ER) modeling and application-oriented database design (Han & Kamber, 2006). An information system, like a data warehouse, is designed to support decision making based on historical point-in-time and prediction data for complex queries or data mining applications (Hoffer, et al., 2007). A data warehouse schema is viewed as a dimensional model (Ahmad et al., 2004, Han & Kamber, 2006; Levene & Loizou, 2003). It typically adopts either a star or snowflake schema and a subject-oriented database design (Han & Kamber, 2006). The schema design is the most critical to the design of a data warehouse.

Data warehouse design is a lengthy, time-consuming, and costly process. Any wrongly calculated step can lead to a failure. A study from Gartner Group in 2005 shows that about 50% of data warehouse projects tend to fail due to problems during data warehouse design and construction (Takecian, et al., 2013). Lengthy development process is attributed to the most important cause of the failure. Often, by the time the systems become available some of the functional features are already obsolete. Therefore, researchers have placed important efforts to the study of design and development related issues and methodologies.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 5: 2 Issues (2017): 1 Released, 1 Forthcoming
Volume 4: 2 Issues (2016)
Volume 3: 2 Issues (2015)
Volume 2: 2 Issues (2014)
Volume 1: 2 Issues (2013)
View Complete Journal Contents Listing