Aligning the Warehouse and the Web

Aligning the Warehouse and the Web

Hadrian Peter (University of the West Indies, Barbados)
Copyright: © 2009 |Pages: 7
DOI: 10.4018/978-1-60566-010-3.ch004
OnDemand PDF Download:
$37.50

Abstract

Data warehouses have established themselves as necessary components of an effective IT strategy for large businesses. To augment the streams of data being siphoned from transactional/operational databases warehouses must also integrate increasing amounts of external data to assist in decision support. Modern warehouses can be expected to handle up to 100 Terabytes or more of data. (Berson and Smith, 1997; Devlin, 1998; Inmon 2002; Imhoff et al, 2003; Schwartz, 2003; Day 2004; Peter and Greenidge, 2005; Winter and Burns 2006; Ladley, 2007). The arrival of newer generations of tools and database vendor support has smoothed the way for current warehouses to meet the needs of the challenging global business environment ( Kimball and Ross, 2002; Imhoff et al, 2003; Ross, 2006). We cannot ignore the role of the Internet in modern business and the impact on data warehouse strategies. The web represents the richest source of external data known to man ( Zhenyu et al, 2002; Chakrabarti, 2002; Laender et al, 2002) but we must be able to couple raw text or poorly structured data on the web with descriptions, annotations and other forms of summary meta-data (Crescenzi et al, 2001). In recent years the Semantic Web initiative has focussed on the production of “smarter data”. The basic idea is that instead of making programs with near human intelligence, we rather carefully add meta-data to existing stores so that the data becomes “marked up” with all the information necessary to allow not-sointelligent software to perform analysis with minimal human intervention. (Kalfoglou et al, 2004) The Semantic Web builds on established building block technologies such as Unicode, URIs(Uniform Resource Indicators) and XML (Extensible Markup Language) (Dumbill, 2000; Daconta et al, 2003; Decker et al, 2000). The modern data warehouse must embrace these emerging web initiatives. In this paper we propose a model which provides mechanisms for sourcing external data resources for analysts in the warehouse.
Chapter Preview
Top

Background

Data Warehousing

Data warehousing is an evolving IT strategy in which data is periodically siphoned off from multiple heterogeneous operational databases and composed in a specialized database environment for business analysts posing queries. Traditional data warehouses tended to focus on historical/archival data but modern warehouses are required to be more nimble, utilizing data which becomes available within days of creation in the operational environments (Schwartz, 2003; Imhoff et al, 2003; Strand and Wangler, 2004; Ladley, 2007). Data warehouses must provide different views of the data, allowing users the options to “drill down” to highly granular data or to produce highly summarized data for business reporting. This flexibility is supported by the use of robust tools in the warehouse environment (Berson and Smith, 1997; Kimball and Ross, 2002).

Data Warehousing accomplishes the following:

  • Facilitates ad hoc end-user querying

  • Facilitates the collection and merging of large volumes of data

  • Seeks to reconcile the inconsistencies and fix the errors that may be discovered among data records

  • Utilizes meta-data in an intensive way.

  • Relies on an implicit acceptance that external data is readily available

Some major issues in data warehousing design are:

  • Ability to handle vast quantities of data

  • Ability to view data at differing levels of granularity

  • Query Performance versus ease of query construction by business analysts

  • Ensuring Purity, Consistency and Integrity of data entering warehouse

  • Impact of changes in the business IT environments supplying the warehouse

  • Costs and Return-on-Investment (ROI)

Complete Chapter List

Search this Book:
Reset