The Graph Traversal Pattern

The Graph Traversal Pattern

Marko A. Rodriguez (AT&T Interactive, USA) and Peter Neubauer (Neo Technology, Sweden)
Copyright: © 2012 |Pages: 18
DOI: 10.4018/978-1-61350-053-8.ch002
OnDemand PDF Download:
$37.50

Abstract

A graph is a structure composed of a set of vertices (i.e. nodes, dots) connected to one another by a set of edges (i.e. links, lines). The concept of a graph has been around since the late 19th century, however, only in recent decades has there been a strong resurgence in both theoretical and applied graph research in mathematics, physics, and computer science. In applied computing, since the late 1960s, the interlinked table structure of the relational database has been the predominant information storage and retrieval model. With the growth of graph/network-based data and the need to efficiently process such data, new data management systems have been developed. In contrast to the index-intensive, set-theoretic operations of relational databases, graph databases make use of index-free, local traversals. This chapter discusses the graph traversal pattern and its use in computing. (Angles & Guiterrez, 2008)
Chapter Preview
Top

The Realization Of Graphs

Relational databases have been around since the late 1960s (Codd, 1970) and are today’s most predominate data management tool. Relational databases maintain a collection of tables. Each table can be defined by a set of rows and a set of columns. Semantically, rows denote objects and columns denote properties/attributes. Thus, the datum at a particular row/column-entry is the value of the column property for that row object. Usually, a problem domain is modeled over multiple tables in order to avoid data duplication. This process is known as data normalization. In order to unify data in disparate tables, a “join” is used. A join combines two tables when columns of one table refer to columns of another table.2 This is the classic relational database design which affords them their flexibility (Mishra & Eich, 1992).

In stark contrast, graph databases do not store data in disparate tables. Instead there is a single data structure – the graph (Angles & Guiterrez, 2008). Moreover, there is no concept of a ``join” operation as every vertex and edge has a direct reference to its adjacent vertex or edge. The data structure is already “joined” by the edges that are defined. There are benefits and drawbacks to this model. First, the primary drawback is that it’s difficult to shard a graph (a difficulty also encountered with relational databases that maintain referential integrity).

Sharding is the process of partitioning data across multiple machines in order to scale a system horizontally.3 In a graph, with unconstrained, direct references between vertices and edges, there usually does not exist a clean data partition. Thus, it becomes difficult to scale graph databases beyond the confines of a single machine and at the same time, maintain the speed of a traversal across sharded borders. However, at the expense of this drawback there is a significant advantage: there is a constant time cost for retrieving an adjacent vertex or edge. That is, regardless of the size of the graph as a whole, the cost of a local read operation at a vertex or edge remains constant. This benefit is so important that it creates the primary means by which users interact with graph databases – traversals. Graphs offer a unique vantage point on data, where the solution to a problem is seen as abstractly defined traversals through its vertices and edges.4

Complete Chapter List

Search this Book:
Reset