Relational Data Access for Business Data Analytics

Relational Data Access for Business Data Analytics

Veit Köppen (Otto-von-Guericke-University Magdeburg, Germany) and Andreas Lübcke (regiocom LLC, Magdeburg, Germany)
Copyright: © 2014 |Pages: 8
DOI: 10.4018/978-1-4666-5202-6.ch182
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The growing amount of data enables more complex business analytics. Data for business analytics is stored in databases or data warehouses. Analysts want to execute their queries under requirements in a suitable time horizon. An architectural decision has a significant influence on response times. Therefore, it is necessary not only to identify and weight analysis tasks, but also to decide on the storage architecture. The architectural design influences the query execution time up to factor 100. We present both architectures and their influence on the database workload. Classic row stores perform better on transactional analysis and column stores outperform the other in simple online analytical processing.
Chapter Preview
Top

Row And Column Stores Architecture

Relational database management systems (DBMSs) are developed to manage data of daily business and reduce paper trails of companies (e.g., financial institutions) (Astrahan et al., 1976). This approach dominates the way of data management that we know as online transaction processing (OLTP). Nowadays, fast and accurate forecasts for revenues and expenses are not enough. A new application domain evolves that focuses on analyses of data to support business decisions. Codd, Codd and Salley (1993) define this type of data analysis as online analytical processing (OLAP). In line with others (Abadi, Madden, & Hachem, 2008; Zukowski, Nes, & Boncz, 2008), we state that two disjunctive application domains for relational data management exist with different scopes, impacts, and limitations.

Row store database systems store tuples sequentially. Thus, it is necessary to load or access all data attributes. In the context of OLTP, this is a suitable and often efficient type to store and access data. However, in data warehousing OLAP is in focus, where only some data attributes, often only one, have to be read. Thus, data access with row stores, which read all attributes of data sets, can be computational costly.

Key Terms in this Chapter

Row Stores: This storage type is dominant in relational databases. It is optimized for transactional processing and stores all attributes sequentially for each data object.

Online Transactional Processing (OLTP): Classical operative databases gather and manage data. OLTP focuses on short read and write transactions and a multi-user access. For OLTP workload, we mostly consider all features of entities. In such a scenario, only restricted resources and data are used per operation. In addition, ACID and other business operative requirements have to be considered.

Workload: A database system is often queried from different users and applications with diverging intentions. A workload describes which operations due to queries the system has to perform to answer all queries.

Relational Database System: Relational database systems are built on the relational data model developed by ( Astrahan, et al., 1976 ). A relational database system enables an integrated storage of a huge amount of data and concurrent access (of users and applications). It consists of three layers: external, conceptual, and internal.

Column Store: A column store is the storage of database entries in such a way that values of attributes are stored together. This enables a higher compression rate and data access for aggregation. It is developed for online analytical processing especially for attribute-wise data access, such as aggregations, groupings, or selections on single attributes of very large relations.

ACID: In database systems transactions are of interest. A transaction can be described in terms of requirements with properties atomicity, consistency, isolation, and durability. Atomicity means that a transaction either is completely executed or not at all in the database system. Integrity constraints enable consistency after execution of a transaction. Isolation restricts parallel execution of queries and durability guarantees that data is stored persistently even in case of erroneous system behavior.

Online Analytical Processing (OLAP): Huge amounts of data are processed and analyzed. In contrast to OLTP, we analyze a few features of entities. Durations of read actions are much longer than in OLTP context. Besides data quality issues, integration, aggregation, and consolidation are important aspects.

Complete Chapter List

Search this Book:
Reset