Database Techniques for New Hardware

Database Techniques for New Hardware

Xiongpai Qin (Renmin University of China, China) and Yueguo Chen (Renmin University of China, China)
Copyright: © 2018 |Pages: 15
DOI: 10.4018/978-1-5225-2255-3.ch169

Abstract

In the last decade, computer hardware progresses with leaps and bounds. The advancements of hardware include: widely application of multi-core CPUs, using of GPUs in data intensive tasks, bigger and bigger main memory capacity, maturity and production use of non-volatile memory etc. Database systems immediately benefit from faster CPU/GPU and bigger memory, and run faster. However, there are some pitfalls. For example, database systems running on multi-core processors may suffer from cache conflicts when the number of concurrently executing DB processes increases. To fully exploit advantages of new hardware to improve the performance of database systems, database software should be more or less revised. This chapter introduces some efforts of database research community in this aspect.
Chapter Preview
Top

Background

Multi-Core CPU

Improving the performance of CPU through increasing its clock frequency becomes more and more difficult. Researchers and engineers bring forth multi-core technology. In a typical multi-core CPU, 2, 4, 8 or more cores are integrated on one chip. Putting several cores on one die allows for higher communication speeds between the cores, which will benefit many computing tasks.

The cores have their own private caches (Level1 Cache, or L1 Cache), and share some larger but slower caches (L2 Cache). They access the shared main memory for parallel data processing. Database systems are basically multi-threaded, and they can benefit from multi-core CPUs without any modification. However, to fully utilize the cores to boost database performance, there is much work to do.

GPU for General Tasks

GPU is traditionally used to accelerate the specific task of graphic rendering. GPU venders have integrated many computing units in a single die, and optimized the bandwidth to process large volume of graphic data. The highly parallelism of GPUs is exploited to speed up data intensive tasks as well, and GPU becomes GPGPU (General Purpose GPU).

Since GPU is designed primarily for graphic processing tasks instead of general tasks, the architecture of a GPU is rather different from CPU. It has its own unique thread hierarchy and memory hierarchy, which should be taken into account when using GPU for data processing tasks.

Bigger Memory

The price of memory is going down, now people can install more memory in a single server. It is rather common for a single server to possess a memory capacity as large as hundreds of gigabytes, or even up to terabytes. For moderate-size applications, it is possible to load the whole dataset into memory for fast access.

Non-Uniform Memory Access

Non-Uniform Memory Access (NUMA) machine is becoming more and more common. The NUMA architecture consists of a small number of processors, each having its own memory and I/O channels. Each group of CPUs is called a ‘node’. Memory that is local to a node is called local or near memory, while memory outside of a node is called foreign or far memory (Golding, 2010). Accessing foreign memory is much slower than accessing local memory. NUMA architecture requires changes of memory management.

Key Terms in this Chapter

NVRAM: Non-volatile memory is some type of computer memory that can keep information after the power is turned off.

MVCC: Multi versioning concurrency control coordinates execution of concurrent transactions by maintaining multiple versions of the same data to avoid mutual interference among these transactions.

DSM: Decomposition Storage Model, vertically partitions an n-attribute relation table into columns, each of which is fetched only when queries need it.

GPU/GPGPU: General Purpose Graphics Processing Unit, is not only used for graphic processing, but also for other tasks such as data processing.

Cache Sensitive (Cache Conscious): Database data/index layouts and processing techniques those are adaptive to the parameters of memory hierarchy.

Cache Oblivious: Database data/index layouts and processing techniques those are independent of parameters of memory hierarchy.

Multi-Core CPU: Several cores are manufactured on a single die for higher performance by parallelism.

PAX: In Partition Attributes Across, a table is firstly horizontally partitioned into pages, in each page all values of each attribute are grouped together, which greatly improves cache performance.

NSM: N-ary Storage Model, stores records contiguously starting from the beginning of each disk page, and uses an offset (slot) table at the end of the page to position the beginning of each record (row). NSM has poor cache performance because it loads the cache with unnecessary attributes.

Complete Chapter List

Search this Book:
Reset