Extending Graph Databases With Relational Concepts

Extending Graph Databases With Relational Concepts

Kornelije Rabuzin, Mirko Čubrilo, Martina Šestak
Copyright: © 2023 |Pages: 16
DOI: 10.4018/978-1-7998-9220-5.ch027
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Today we have to deal with large amounts of data, and for large amounts of interconnected data, graph databases represent an excellent choice. However, some concepts that are used in relational databases are still not supported in graph databases, and this could hinder their wider use. In this article, how relational database concepts have been successfully transferred into graph databases, including integrity constraint support, inheritance, trigger specification, new visual query languages, etc., is demonstrated.
Chapter Preview
Top

Introduction

Data science is a topic that uses different scientific methods, techniques, and algorithms for extracting useful information and knowledge from data. There are several reasons to use data science; for explaining what happened in the past, what is happening in the present, and even try to predict what is most likely to happen in the future. Data science is an interdisciplinary field, which includes data analysis, machine learning, business intelligence, quantitative methods, data visualization, data preparation, statistics, etc. Due to the heterogeneity of the subjects inside data science, it is not easy to be an expert in all of them, and it takes a large amount of time to master the different subjects. Because of that, some people predict that educational institutions will not be able to produce enough data scientists in the near future, which was already recognized by some countries, including the EU.

The core component of data science is data, and without it, data scientists would not be able to perform their work. Another important topic is the data source, which can be categorized as un-, semi- and structured sources. As a brief explanation, unstructured data can be for example an email - a collection of words without any structure behind it - while structured data can be found in a database where data is grouped into tables, and each row of some table shares the same structure with the rows that come before or after. Regarding semi-structured sources, a good example are XML documents, since XML nodes in the same document can have different structures. A relational database, which is a type of structured data source, can be a good data source, since people are familiar with them.

During the past decade, computer systems had to store and manage large amounts of data coming from different sources. For example, the Internet is being used on a daily basis by millions of users, and the size of generated text, messages, searches, posts, images, videos, etc. is enormous. Furthermore, many Internet of Things (IoT) devices are connected to the Internet and they also tend to generate large amounts of different types of data. While in the past one would easily talk about gigabytes and terabytes of data, today it is usual to deal with petabytes and exabytes, even though this scale is expected to grow to zettabytes and yottabytes in the future. It is clear that turning to this new field (Big Data), there is a constant need to find scalable solutions for efficient data storage and management. When the frequency and volume of data generation started to increase 15 years ago, the goal was to rethink and find new ways of handling and storing large amounts of data. The solutions that were used in early 2000s were just not suitable anymore for efficiently handling the huge amounts of generated data. Relational databases, which were efficient solutions for everyday business transactional applications, are not appropriate for extremely large amounts of data, with possibly different structures and schema. These databases were not capable to store and manage the data in a satisfactory manner, leading to difficulties in real time data processing, as well as ad-hoc data querying.

In order to store and manage the data, NoSQL database systems were introduced. These are used today by many companies for different purposes, and they can be categorized into four main NoSQL database types: document-oriented, column-oriented, key-value and graph databases. In this chapter, the focus is put on graph databases, as they turned out to be an excellent choice to store and query large amounts of interconnected data, often generated by modern information systems. In general, graph databases represent a database solution based on a graph data structure, where data is stored in the form of nodes connected with relationships, where both elements can have properties as attributes, which describe real-world objects. The other NoSQL databases types also have their own advantages. For instance, key-value databases can quickly retrieve the value for the specified key; document-oriented systems can store all the important data for an entity in a single document; column-store systems are similar to relational databases with an increased flexibility to the schema. Thus, it can be concluded that each type is suitable for different application and can be interchangeably used to resolve different challenges.

Key Terms in this Chapter

Query By Example (QBE): An alternative visual approach to querying databases through the use of graphical user interface without having to manually write the query in a given query language syntax.

Structured Query Language (SQL): The most important and wide-spread querying language for relational databases.

NoSQL: A generation of database systems, which do not use SQL as the primary database query language and tackle the challenges (e.g., scalability, flexibility) attributed to traditional relational databases.

Integrity Constraints: Rules that are used to maintain the quality of data in a database.

Graph Database Management System: A category of NoSQL Database Management Systems, which stores data in graph-like structures that consist of nodes and relationships.

Neo4j: The most widely used Graph Database Management System.

Triggers: Database objects that are activated when something happens in the database.

Complete Chapter List

Search this Book:
Reset