Benchmark for Approximate Query Answering Systems

Benchmark for Approximate Query Answering Systems

Francesco Di Tria (Dipartimento di Informatica, Università degli Studi di Bari Aldo Moro, Bari, Italy), Ezio Lefons (Dipartimento di Informatica, Università degli Studi di Bari Aldo Moro, Bari, Italy) and Filippo Tangorra (Dipartimento di Informatica, Università degli Studi di Bari Aldo Moro, Bari, Italy)
Copyright: © 2015 |Pages: 29
DOI: 10.4018/JDM.2015010101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The standard benchmark for Decision Support Systems is TPC-H, which is composed of a database, a workload, and a set of metrics for the performance evaluation. However, TPC-H does not include a methodology for the benchmark of Approximate Query Answering Systems, or the software tools used to obtain fast answers to analytical queries in the decision making process. In the paper, the authors present a methodology to evaluate and compare Approximate Query Answering Systems. To this aim, a methodology that extends the standard TPC-H and a set of new metrics that take into account the specific features of these systems are proposed. Experimental results show the application of these metrics to two systems based on the data analytic approximation by orthonormal series.
Article Preview

Introduction

A Decision Support System is the part of the information system that allows business decision makers to perform analytical processing and knowledge discovering, starting from data stored in data warehouses (Holsapple & Whinston, 1996). Since this kind of data processing is very time-consuming, Approximate Query Answering Systems have been developed, in order to provide fast answers to analytical queries (Gibbons et al., 1998).

Approximate Query Answering systems rely on query engines able to process data synopses, which are obtained by a reduction process of the data stored in the data warehouse. This process aims at reducing the cardinality of datasets, in order to obtain a small and concise representation of the statistical profile of data. Then, analytical queries are always answered loading the opportune data synopses into the central memory and using the data synopses instead of performing access to real data stored in the data warehouse. As a consequence, the answering time is very low. On the other hand, the results of analytical processing are approximate answers, which are estimates of the exact answers and can be affected with a small quantity of error. However, approximate answers carry the same information as the exact answers and, therefore, they are widely accepted in decision making processes. The underlying assumption is that an approximate answer is better than the exact answer if the query takes a long time to be completed and the approximation error is low. Indeed, in many cases the total precision is not mandatory. As an example, in a sequence of drill-down and/or roll-up operations, the first queries are used only to determine which multidimensional subset of the cube is to be explored in next operations.

There are several methods in literature (Cormode et al., 2011) to compute the data synopsis and, therefore, it is necessary to detect a set of criteria to evaluate the performance of systems that implement such methods.

The Council that deals with standard benchmarks for databases is the Transaction Processing Performance Council (TPC), which defined the TPC-H, the benchmark for the evaluation of the performance of Decision Support Systems (Poess & Floyd, 2000; TPC-H, 2011).

This benchmark is composed of both methodology and metrics that allow to measure the transaction processing. In particular, it includes a set of ad hoc queries, able to simulate a workload for (i) business analytical processing, (ii) examining large volumes of data, and (iii) executing very complex queries. The metrics aim at evaluating the performance, by considering the time necessary to load data into the data warehouse and the answering time, also in presence of multiple query streams.

Nonetheless, the metrics do not address the criteria for evaluating Approximate Query Answering systems. In fact, they do not take into account the specific features of these systems, related to the architectural and methodological aspects, such as the costs of the data reduction, which must be evaluated according to (a) the time necessary to compute the data synopses and (b) the storage space necessary to store the data synopses. Other important evaluation criteria are (c) the response time to queries and (d) the accuracy of approximate answers.

The aim of the paper is to extend the TPC-H in order to allow evaluating not only the performance of traditional Decision Support Systems but also the performance of systems that use approximate query engines. To this end, we first introduce the main features of Approximate Query Answering Systems, along with their evaluation criteria. Then, we propose a benchmark for evaluating these systems on the basis of ad hoc metrics. At last, we present the application of the benchmark in order to test the goodness of the metrics.

In detail, the paper is organized as follows. In the next Section, we introduce the terminology used in approximate query processing. The third Section presents two approximate query answering systems, namely ADAP (acronym of Analytical DAta Profile) and Komodo, used for the experimental application of the benchmark. The fourth Section shows our extension to TPC-H, which includes the ad hoc workload and the proposed methodology. The fifth Section explains the metrics, along with an experimental application to the systems. The sixth Section contains related work about systems in approximate query processing. The final Section concludes the paper with some our remarks.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing