Towards Massive RDF Storage in NoSQL Databases: A Survey

Towards Massive RDF Storage in NoSQL Databases: A Survey

Zongmin Ma (Nanjing University of Aeronautics and Astronautics, China) and Li Yan (Nanjing University of Aeronautics and Astronautics, China)
DOI: 10.4018/978-1-5225-8446-9.ch013

Abstract

The resource description framework (RDF) is a model for representing information resources on the web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the web, a huge amount of RDF data is being proliferated and becoming available. So, RDF data management is of increasing importance and has attracted attention in the database community as well as the Semantic Web community. Currently, much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (not only SQL) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.
Chapter Preview
Top

Introduction

The Resource Description Framework (RDF) is a framework for representing information resources on the Web, which is proposed by W3C (World Wide Web Consortium) as a recommendation (Manola and Miller, 2004). RDF can represent structured and unstructured data (Duan, Kementsietsidis, Srinivas and Udrea, 2011), and more important, metadata of resources on the Web represented by RDF can be shared and exchanged among application programming without semantic missing. Here metadata mean the data that specify semantic information about data. Currently RDF has been widely accepted and has rapidly gained popularity. And many organizations, companies and enterprises have started using RDF for representing and processing their data. We can find some application examples such as the United States1, the United Kingdom2, New York Times3, BBC4, and Best Buy5. RDF is finding increasing use in a wide range of Web data-management scenarios.

With the widespread usage of RDF in diverse application domains, a huge amount of RDF data is being proliferated and becoming available. As a result, efficient and scalable management of large-scale RDF data is of increasing importance, and has attracted attentions in the database community as well as the Semantic Web community. Currently, much work is being done in RDF data management. Some RDF data-management systems have started to emerge such as Sesame (Broekstra, Kampman and van Harmelen, 2002), Jena-TDB (Wilkinson, Sayers, Kuno and Reynolds, 2003), Virtuoso (Erling and Mikhailov, 2007 & 2009), 4Store (Harris, Lamb and Shadbolt, 2009)), BigOWLIM (Bishop et al., 2011) and Oracle Spatial and Graph with Oracle Database 12c6. Here BigOWLIM is renamed to OWLIM-SE and further to GraphDB. Also some research prototypes have been developed (e.g., RDF-3X (Neumann and Weikum, 2008 & 2010), SW-Store (Abadi, Marcus, Madden and Hollenbach, 2007 & 2009) and RDFox7).

Key Terms in this Chapter

SPARQL: SPARQL (Simple Protocol and RDF Query Language) is an RDF query language which is a W3C recommendation. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions.

JSON: JSON (JavaScript Object Notation) is a binary and typed data model which is applied to represent data like list, map, date, Boolean as well as different precision numbers.

CAP: CAP theorem means three properties, which are consistency, availability, and partition tolerance.

RDF: Resource description framework (RDF) is a W3C (World Wide Web Consortium) recommendation which provides a generic mechanism for representing information about resources on the web.

Big Data: Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. There is not a common definition of big data, and big data are generally characterized by some properties such as volume, velocity, variety, and so on.

NoSQL Databases: NoSQL means “not only SQL” or “no SQL at all.” Being a new type of non-relational databases, NoSQL databases are developed for efficient and scalable management of big data.

ACID: ACID means four properties, which are (A)tomcity, (C)onsistency, (I)solation and (D)urability. ACID is the type of transaction processing done by relational database management system (RDBMS).

Complete Chapter List

Search this Book:
Reset