Data Warehouse Benchmarking with DWEB

Jérôme Darmont

doi:10.4018/978-1-60566-232-9.ch015

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Data Warehouse Benchmarking with DWEB

Jérôme Darmont

Source Title: Progressive Methods in Data Warehousing and Business Intelligence: Concepts and Competitive Analytics

DOI: 10.4018/978-1-60566-232-9.ch015

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Performance evaluation is a key issue for designers and users of Database Management Systems (DBMSs). Performance is generally assessed with software benchmarks that help, for example test architectural choices, compare different technologies, or tune a system. In the particular context of data warehousing and On-Line Analytical Processing (OLAP), although the Transaction Processing Performance Council (TPC) aims at issuing standard decision-support benchmarks, few benchmarks do actually exist. We present in this chapter the Data Warehouse Engineering Benchmark (DWEB), which allows generating various ad-hoc synthetic data warehouses and workloads. DWEB is fully parameterized to fulfill various data warehouse design needs. However, two levels of parameterization keep it relatively easy to tune. We also expand on our previous work on DWEB by presenting its new Extract, Transform, and Load (ETL) feature, as well as its new execution protocol. A Java implementation of DWEB is freely available online, which can be interfaced with most existing relational DMBSs. To the best of our knowledge, DWEB is the only easily available, up-to-date benchmark for data warehouses.

Chapter Preview

Top

Introduction

Performance evaluation is a key issue for both designers and users of Database Management Systems (DBMSs). It helps designers select among alternate software architectures, performance optimization strategies, or validate or refute hypotheses regarding the actual behavior of a system. Thus, performance evaluation is an essential component in the development process of efficient and well-designed database systems. Users may also employ performance evaluation, either to compare the efficiency of different technologies before selecting one, or to tune a system. In many fields including databases, performance is generally assessed with the help of software benchmarks. The main components in a benchmark are its database model and workload model (set of operations to execute on the database).

Evaluating data warehousing and decision-support technologies is a particularly intricate task. Though pertinent, general advice is available, notably on-line (Pendse, 2003; Greenfield, 2004a), more quantitative elements regarding sheer performance, including benchmarks, are few. In the late nineties, the OLAP (On-Line Analytical Process) APB-1 benchmark has been very popular. Henceforth, the Transaction Processing Performance Council (TPC) (1), a non-profit organization, defines standard benchmarks (including decision-support benchmarks) and publishes objective and verifiable performance evaluations to the industry.

Our own motivation for data warehouse benchmarking was initially to test the efficiency of performance optimization techniques (such as automatic index and materialized view selection techniques) we have been developing for several years. None of the existing data warehouse benchmarks suited our needs. APB-1’s schema is fixed, while we needed to test our performance optimization techniques on various data warehouse configurations. Furthermore, it is no longer supported and somewhat difficult to find. The TPC currently supports the TPC-H decision-support benchmark (TPC, 2006). However, its database schema is inherited from the older and obsolete benchmark TPC-D (TPC, 1998), which is not a dimensional schema such as the typical star schema and its derivatives that are used in data warehouses (Inmon, 2002; Kimball & Ross, 2002). Furthermore, TPC-H’s workload, though decision-oriented, does not include explicit OLAP queries either. This benchmark is implicitly considered obsolete by the TPC that has issued some draft specifications for its successor: TPC-DS (TPC, 2007). However, TPC-DS, which is very complex, especially at the ETL (Extract, Transform, and Load) and workload levels, has been under development since 2002 and is not completed yet.

Furthermore, although the TPC decision-support benchmarks are scalable according to Gray’s (1993) definition, their schema is also fixed. For instance, TPC-DS’ constellation schema cannot easily be simplified into a simple star schema. It must be used “as is”. Different ad-hoc configurations are not possible. Furthermore, there is only one parameter to define the database, the Scale Factor (SF), which sets up its size (from 1 to 100,000 GB). Users cannot control the size of dimensions and fact tables separately, for instance. Finally, users have no control on workload definition. The number of generated queries directly depends on SF.

Eventually, in a context where data warehouse architectures and decision-support workloads depend a lot on application domain, it is very important that designers who wish to evaluate the impact of architectural choices or optimization techniques on global performance can choose and/or compare among several configurations. The TPC benchmarks, which aim at standardized results and propose only one configuration of warehouse schema, are ill-adapted to this purpose. TPC-DS is indeed able to evaluate the performance of optimization techniques, but it cannot test their impact on various choices of data warehouse architectures. Generating particular data warehouse configurations (e.g., large-volume dimensions) or ad-hoc query workloads is not possible either, whereas it could be an interesting feature for a data warehouse benchmark.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Data Warehouse Benchmarking with DWEB

Abstract

Introduction

Complete Chapter List