Parallel Real-Time OLAP on Multi-Core Processors

Parallel Real-Time OLAP on Multi-Core Processors

Frank Dehne (School of Computer Science, Carleton University, Ottawa, Canada) and Hamidreza Zaboli (School of Computer Science, Carleton University, Ottawa, Canada)
Copyright: © 2015 |Pages: 22
DOI: 10.4018/ijdwm.2015010102
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

One of the most powerful and prominent technologies for knowledge discovery in decision support systems is online analytical processing (OLAP). Most of the traditional OLAP research, and most of the commercial systems, follow the static data cube approach proposed by Gray et.al. and materialize all or a subset of the cuboids of the data cube in order to ensure adequate query performance. Practitioners have called for some time for a real-time OLAP approach where the OLAP system gets updated instantaneously as new data arrives and always provides an up-to-date data warehouse for the decision support process. However, a major problem for real-time OLAP is the significant performance issues with large scale data warehouses. The aim of our research is to address these problems through the use of efficient parallel computing methods. In this paper, we present a parallel real-time OLAP system for multi-core processors. To our knowledge, this is the first real-time OLAP system that has been parallelized and optimized for contemporary multi-core architectures. Our system allows for multiple insert and multiple query transactions to be executed in parallel and in real-time. We evaluated our method for a multitude of scenarios (different ratios of insert and query transactions, query transactions with different amounts of data aggregation, different database sizes, etc.), using the TPCDS “Decision Support” benchmark data set. As multi-core test platforms, we used an Intel Sandy Bridge processor with 4 cores (8 hardware supported threads) and an Intel Xeon Westmere processor with 20 cores (40 hardware supported threads). The tests demonstrate that, with increasing number of processor cores, our parallel system achieves close to linear speedup in transaction response time and transaction throughput. On the 20 core architecture we achieved, for a 100 GB database, a better than 0.25 second query response time for real-time OLAP queries that aggregate 25% of the database. Since hardware performance improvements are currently, and in the foreseeable future, achieved not by faster processors but by increasing the number of processor cores, our new parallel real-time OLAP method has the potential to enable OLAP systems that operate in real-time on large databases.
Article Preview

1. Introduction

This paper reports on the results of an IBM funded research project to investigate the use of multi-core processors for high performance, real-time, online analytical processing (OLAP). Such OLAP systems are at the heart of many business analytics applications. The ever growing data warehouses built by corporate and institutional users have lead to significant performance bottlenecks which motivated this research project.

1.1. Background

Decision Support Systems (DSS) are designed to empower the user with the ability to make effective decisions regarding both the current and future state of an organization. To do so, the DSS must not only encapsulate static information, but it must also allow for the extraction of patterns and trends that would not be immediately obvious. Users must be able to visualize the relationships between such things as customers, vendors, products, inventory, geography, and sales. Moreover, they must understand these relationships in a chronological context since it is the time element that ultimately gives meaning to the observations that are formed. One of the most powerful and prominent technologies for knowledge discovery in DSS environments is online analytical processing (OLAP).

OLAP is the foundation for a wide range of essential business applications, including sales and marketing analysis, planning, budgeting, and performance measurement (Han, 2000 & The OLAP Report). The processing logic associated with this form of analysis is encapsulated in what is known as the OLAP server. By exploiting multidimensional views of the underlying data warehouse, the OLAP server allows users to “drill down” or “roll up” on hierarchies, “slice and dice” particular attributes, or perform various statistical operations such as ranking and forecasting. Figure 1 illustrates the basic model where the OLAP server represents the interface between the data warehouse proper and the reporting and display applications available to end users.

Figure 1.

Three-tiered OLAP model

To support this functionality, OLAP relies heavily upon a classical data model known as the data cube (Gray, 1997). Conceptually, the data cube allows users to view organizational data from different perspectives and at a variety of summarization levels. It consists of the base cuboid, the finest granularity view containing the full complement of d dimensions (or attributes), surrounded by a collection of 2d-1 sub-cubes/cuboids that represent the aggregation of the base cuboid along one or more dimensions. Figure 2 illustrates a small four-dimensional data cube that might be associated with the automotive industry. In addition to the base cuboid, one can see a number of various planes and points that represent aggregation at coarser granularity. Note that each cell in the cube structure corresponds to an aggregate value along one or more measure attributes (e.g. total sales).

Figure 2.

A three dimensional data cube for automobile sales data

Most of the traditional OLAP research, and most of the commercial systems, follow the static data cube approach proposed by Gray (1997) and materialize all or a subset of the cuboids of the data cube in order to ensure adequate query performance. Building the data cube can be a massive computational task, and significant research has been published on sequential and parallel data cube construction methods (e.g. (Chen, 2008 & Dehne, 2002 & Gray, 1997 & GuoLiang, 2010 & Ng, 2001 & You, 2008)). However, the traditional static data cube approach has several disadvantages. The OLAP system can only be updated periodically and in batches, e.g. once every week. Hence, latest information cannot be included in the decision support process. The static data cube also requires massive amounts of memory space and leads to a duplicate data repository that is separate from the online transaction processing (OLTP) system of the organization. Several practitioners have therefore called for some time for an integrated OLAP/OLTP approach with a real-time OLAP system that gets updated instantaneously as new data arrives and always provides an up-to-date data warehouse for the decision support process (e.g. (Bruckner, 2002)). Some recent publications have tried to address this problem by providing “quasi real-time” incremental maintenance schemes and loading procedures for static data cubes (e.g. (Bruckner, 2002 & Jin, 2008 & Santos, 2008 & Santos, 2009)). However, these approaches are not fully real-time. A major problem is significant performance issues with large scale data warehouses. The aim of our research is to address these performance problems through the use of efficient parallel multi-core computing methods.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing