How NoSQL Databases Work

How NoSQL Databases Work

Copyright: © 2018 |Pages: 52
DOI: 10.4018/978-1-5225-3385-6.ch004
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The chapter explains how NoSQL databases work. Since different NoSQL databases are classified into four categories (key-value, column-family, document, and graph stores), three main features of NoSQL databases are chosen, and their practical implementation is explained using examples of one or two typical NoSQL databases from each NoSQL database category. The three chosen features are: distributed storage architecture that comprises the distributed, cluster-oriented, and horizontally scalable features; consistency model that refers to the CAP and BASE features; query execution that refers to the schemaless feature. These features are chosen because, through them, it is possible to describe most of the new and innovative approaches that NoSQL databases bring to the database world.
Chapter Preview
Top

Introduction

The first three chapters explained the NoSQL phenomenon, the reasons for NoSQL database emergence, the main characteristics of these databases, and the types of databases. The purpose of this chapter is to explain how NoSQL databases work. Since it is beyond the scope of this book to describe NoSQL databases in detail, as there are specialized books and manuals written for that purpose, a different approach is taken here. Three main features of NoSQL databases are chosen, and their practical implementation is explained using examples of one or two typical NoSQL databases from each NoSQL database category (see Chapter 2). The three chosen features are:

  • Distributed storage architecture that comprises the distributed, cluster-oriented, and horizontally scalable features described in Chapter 2.

  • Consistency model that refers to the CAP and BASE feature described in Chapter 2.

  • Query execution that refers to the schemaless feature described in Chapter 2.

These features are chosen because, through them, it is possible to describe most of the new and innovative approaches that NoSQL databases bring to the database world.

Top

Distributed Storage Architectures

Data distribution and use of clusters are natural features of NoSQL databases (see Chapter 2.4). Namely, it is precisely the ability to run NoSQL databases on a large cluster that has drawn attention to this type of database. As amounts of data kept increasing, it was more and more difficult and costly for the organization to scale up (to purchase larger and larger servers to run the database on), and the option to scale out (to run the database on a cluster of servers) was becoming increasingly attractive. As already explained (see Chapter), relational databases were developed primarily to run on a single server, so attempts to implement these databases in distributed and cluster environments led to many problems and disrupted the basic postulates of these databases.

Broadly, there are two styles for distributing data (Sadalage & Fowler, 2013):

  • Sharding distributes data across multiple servers in a way that every server acts as the only source for the assigned data subset (see Chapter 1).

  • Replication copies data across multiple servers so that each piece of data can be found in multiple places. There are two basic forms of replication:

    • o

      Master-slave replication where one node is the authoritative copy (master) that executes writes, while slave nodes are synchronized with the master node and can run reads.

    • o

      Peer-to-peer (masterless) replication allows writes on any node, and nodes are coordinated in order to synchronize their data copies.

Master-slave replication decreases the possibility of conflicts during updates, but peer-to-peer replication avoids assigning all writes to a single point of failure. Implementations can use either individual approach or a combination of the two.

Most NoSQL databases are designed to be operated on a high-availability file system; this is usually the Hadoop Distributed File System (HDFS). However, some NoSQL databases (e.g., Cassandra) have developed their own systems that are compatible with HDFS. Certainly, the use of a specific file system such as HDFS has both advantages and disadvantages. The advantages of using a distributed file system in NoSQL databases are (McCreary & Kelly, 2014):

Complete Chapter List

Search this Book:
Reset