A Benchmark for Performance Evaluation of a Multi-Model Database vs. Polyglot Persistence

A Benchmark for Performance Evaluation of a Multi-Model Database vs. Polyglot Persistence

Feng Ye, Xinjun Sheng, Nadia Nedjah, Jun Sun, Peng Zhang
Copyright: © 2023 |Pages: 20
DOI: 10.4018/JDM.321756
Article PDF Download
Open access articles are freely available for download

Abstract

As the need for handling data from various sources becomes crucial for making optimal decisions, managing multi-model data has become a key area of research. Currently, it is challenging to strike a balance between two methods: polyglot persistence and multi-model databases. Moreover, existing studies suggest that current benchmarks are not completely suitable for comparing these two methods, whether in terms of test datasets, workloads, or metrics. To address this issue, the authors introduce MDBench, an end-to-end benchmark tool. Based on the multi-model dataset and proposed workloads, the experiments reveal that ArangoDB is superior at insertion operations of graph data, while the polyglot persistence instance is better at handling the deletion operations of document data. When it comes to multi-thread and associated queries to multiple tables, the polyglot persistence outperforms ArangoDB in both execution time and resource usage. However, ArangoDB has the edge over MongoDB and Neo4j regarding reliability and availability.
Article Preview
Top

Introduction

There is an increasing demand for analyzing and processing multi-model data, including structured, semi-structured, and unstructured data. In particular, structured data commonly refer to relational, key-value, and graph data; semi-structured data mainly include JSON and XML documents; and unstructured data are typically text files. For multi-model data management, it is inevitable and difficult for developers to make trade-offs between multi-model databases and polyglot persistence. However, existing studies suggest current benchmarks are not completely suitable for evaluating and comparing multi-model databases and polyglot persistence, whether in terms of test datasets, workloads, or metrics. First, obtaining large-scale real multi-model data is difficult and costly, and few data generators can generate multi-model test datasets. Second, the workloads of the existing benchmarks are not comprehensive and cannot cover diversified multi-model data application scenarios. Finally, most of the existing benchmarks pay more attention to the execution time of the workloads while ignoring the metrics of infrastructure resource usage and nonfunctional attributes. Specifically, in a distributed environment, database system failure is considered a normal event rather than an accident (Ghemawat et al., 2003), so collecting and measuring database resource usage and nonfunctional attributes is very important. However, as far as we know, there is no benchmark for multi-model databases and polyglot persistence that takes resource usage and nonfunctional attributes as metrics. Aiming at these problems, we propose an end-to-end benchmark named MDBench for evaluating and comparing a multi-model database and polyglot persistence. The main contributions of this paper are summarized as follows:

  • 1.

    A scalable multi-model data generator is designed for generating multi-model test datasets. The key algorithm of the data generator is efficient to ensure that no matter how large the dataset is generated, it will not cause serious out-of-memory resources.

  • 2.

    Four groups of representative workload experiments are designed and implemented to simulate different multi-model data application scenarios. In particular, a multi-thread workload experiment and reliability and availability experiments are conducted in the research field of evaluation and comparing multi-model databases and polyglot persistence.

  • 3.

    Based on data store selection, we use MDBench to implement a comprehensive performance evaluation on the single multi-model database ArangoDB and a polyglot persistence instance that consists of MongoDB and Neo4j and systematically analyze the experimental results.

The subsequent contents are organized as follows. First, the research status of database benchmarking is summarized. In the next section, we introduce the data stores involved in the evaluation and the reasons for selecting them. Then, MDBench is introduced in detail from three aspects: multi-model data generation, workloads, and metrics mechanism. Next, the experimental results are introduced and analyzed. Finally, the paper is summarized and proposed.

Top

Overview Of Dbms Benchmarks

The database benchmark can perform repeatable, comparable, quantitative tests on performance indicators. Existing database benchmarks in the industry can be divided into the following two categories: RDBMS and NoSQL benchmarks. The multi-model database benchmarks belong to NoSQL benchmarks. Because of the particularity of its data model, we will also introduce multi-model database benchmarks separately.

Complete Article List

Search this Journal:
Reset
Volume 35: 1 Issue (2024)
Volume 34: 3 Issues (2023)
Volume 33: 5 Issues (2022): 4 Released, 1 Forthcoming
Volume 32: 4 Issues (2021)
Volume 31: 4 Issues (2020)
Volume 30: 4 Issues (2019)
Volume 29: 4 Issues (2018)
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing