Cost Models for Selecting Materialized Views in Public Clouds

Cost Models for Selecting Materialized Views in Public Clouds

Romain Perriot (Clermont Université, Université Blaise Pascal, Aubière Cedex, France), Jérémy Pfeifer (Clermont Université, Université Blaise Pascal, Aubière Cedex, France), Laurent d'Orazio (Clermont Université, Université Blaise Pascal, Aubière Cedex, France), Bruno Bachelet (Clermont Université, Université Blaise Pascal, Aubière Cedex, France), Sandro Bimonte (IRSTEA, Clermont-Ferrand, France) and Jérôme Darmont (Laboratoire ERIC, Université de Lyon, Lyon, France)
Copyright: © 2014 |Pages: 25
DOI: 10.4018/ijdwm.2014100101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Data warehouse performance is usually achieved through physical data structures such as indexes or materialized views. In this context, cost models can help select a relevant set of such performance optimization structures. Nevertheless, selection becomes more complex in the cloud. The criterion to optimize is indeed at least two-dimensional, with monetary cost balancing overall query response time. This paper introduces new cost models that fit into the pay-as-you-go paradigm of cloud computing. Based on these cost models, an optimization problem is defined to discover, among candidate views, those to be materialized to minimize both the overall cost of using and maintaining the database in a public cloud and the total response time of a given query workload. It experimentally shows that maintaining materialized views is always advantageous, both in terms of performance and cost.
Article Preview

2. Background

We present in this section the background information related to view materialization in the cloud. We first introduce a simple fictitious use case that serves as a running example throughout this paper. Then, we describe different pricing models in the cloud. Finally, we briefly recall the principle of view materialization.

2.1. Running Example

To illustrate our work, we rely on a simulated dataset storing the sales of an international supply chain. Business users need to analyze the total profit per day, month, and year; and per administrative department, region, and country.

Our full dataset stores 10 years (2000-2010) of sale data. Its size is 500 GB. We run over this dataset a query workload Q that includes such queries as Q1= “sales per year and country”, whose processing time is 0.2 hour. The size of Q's result is 10 GB. A typical materialized view we may consider to optimize overall response time is V1 = “sales per month and country”, whose processing time is 0.1 hour. The whole set of selected materialized views is denoted V. V's size is 50 GB. Finally, the times to process Q with and without exploiting V are 40 hours and 50 hours, respectively.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing