HTAP With Reactive Streaming ETL

HTAP With Reactive Streaming ETL

Carl Camilleri, Joseph G. Vella, Vitezslav Nezval
Copyright: © 2021 |Pages: 19
DOI: 10.4018/JCIT.20211001.oa10
Article PDF Download
Open access articles are freely available for download

Abstract

In database management systems (DBMSs), query workloads can be classified as online transactional processing (OLTP) or online analytical processing (OLAP). These often run within separate DBMSs. In hybrid transactional and analytical processing (HTAP), both workloads may execute within the same DBMS. This article shows that it is possible to run separate OLTP and OLAP DBMSs, and still support timely business decisions from analytical queries running off fresh transactional data. Several setups to manage OLTP and OLAP workloads are analysed. Then, benchmarks on two industry standard DBMSs empirically show that, under an OLTP workload, a row-store DBMS sustains a 1000 times higher throughput than a columnar DBMS, whilst OLAP queries are more than 4 times faster on a columnar DBMS. Finally, a reactive streaming ETL pipeline is implemented which connects these two DBMSs. Separate benchmarks show that OLTP events can be streamed to an OLAP database within a few seconds.
Article Preview
Top

1. Introduction

In database management systems (DBMS), query workloads are segmented into two broad modes (Elnaffar et al., 2002; Li et al., 2019). Online transactional processing (OLTP) workloads typically consist of write queries that modify small amounts of data, and queries that read a few records whilst projecting the majority of the attributes available (Bach & Werner, 2016). In OLTP, queries are expected to have short response times, often in the order of microseconds (Harizopoulos et al., 2018), in order to avoid user frustration and business impact (Poggi et al., 2014). At the other end of the spectrum, Online analytical processing (OLAP) workloads typically consist of read-only queries which traverse a large amount of records, performing aggregations and projecting a narrow set of attributes (Bach & Werner, 2016). A system dedicated to OLAP queries is also known as a Business Intelligence (BI) or Decision Support System (DSS), since such queries often aim to elicit information from a data warehouse to support making decisions.

Traditionally, longer response times for OLAP queries have been tolerated, and such queries tend to execute within a dedicated data warehouse which is periodically loaded by data coming from operational (OLTP) systems, typically via extract-transform-load (ETL) processes. On the other hand, modern business requirements are refusing the bounds of these assumptions. The phenomenon of perishable insights (E. A. Lee, 2018), as illustrated in Figure 1, indicates that, in some application domains such as fraud detection, data might lose value for decision making as time passes. In such use cases, increasing the data freshness in the OLAP database is beneficial.

Figure 1.

Perishable Insights (E. A. Lee, 2018)

JCIT.20211001.oa10.f01
Top

2. Problem Definition

Running transactional and analytical workloads efficiently on the same dataset is an open problem which attracts research and commercial interests (Yang et al., 2020). Referred to as Hybrid Transactional and Analytical Processing (HTAP), several approaches are proposed to tackle the ostensibly conflicting demands of preserving the performance of transactional workloads whilst at the same time running analytical queries efficiently on fresh data to facilitate time-critical business decisions.

Several HTAP systems presented in the literature are bespoke DBMSs. These vary from adopting the Single System for OLTP and OLAP approach (Yang et al., 2020) that typically rely on support from cutting-edge hardware (Appuswamy et al., 2017) to handle both OLTP and OLAP workloads on the same hardware, to those adopting the Separate OLTP and OLAP Systems approach, which deploy loosely-coupled OLTP an OLAP DBMSs.

Several problems are identified. Firstly, although data freshness is largely improved by taking the Single System for OLTP and OLAP approach, OLTP and OLAP workloads running on the same hardware conflict, with some systems reporting a reduction of OLTP throughput by three times when running OLAP queries concurrently (J. Lee et al., 2018).

Secondly, reliance on cutting-edge hardware, such as fast non-volatile memory (NVM), restricts DBMS users from exploiting commodity hardware for their workloads and may therefore be either an infeasible solution if the hardware is not available, or require a costlier hardware setup (Neumann & Freitag, 2020).

Lastly, an approach based on bespoke solutions forces the use of specific DBMSs, which might not be compatible with the rest of the software ecosystem or require specialised expertise on the database administrator (DBA) team, increasing the complexity of the information system (IS).

Complete Article List

Search this Journal:
Reset
Volume 26: 1 Issue (2024)
Volume 25: 1 Issue (2023)
Volume 24: 5 Issues (2022)
Volume 23: 4 Issues (2021)
Volume 22: 4 Issues (2020)
Volume 21: 4 Issues (2019)
Volume 20: 4 Issues (2018)
Volume 19: 4 Issues (2017)
Volume 18: 4 Issues (2016)
Volume 17: 4 Issues (2015)
Volume 16: 4 Issues (2014)
Volume 15: 4 Issues (2013)
Volume 14: 4 Issues (2012)
Volume 13: 4 Issues (2011)
Volume 12: 4 Issues (2010)
Volume 11: 4 Issues (2009)
Volume 10: 4 Issues (2008)
Volume 9: 4 Issues (2007)
Volume 8: 4 Issues (2006)
Volume 7: 4 Issues (2005)
Volume 6: 1 Issue (2004)
Volume 5: 1 Issue (2003)
Volume 4: 1 Issue (2002)
Volume 3: 1 Issue (2001)
Volume 2: 1 Issue (2000)
Volume 1: 1 Issue (1999)
View Complete Journal Contents Listing