Nested Optional Join for Efficient Evaluation of SPARQL Nested Optional Graph Patterns

Nested Optional Join for Efficient Evaluation of SPARQL Nested Optional Graph Patterns

Artem Chebotko (University of Texas - Pan American, USA) and Shiyong Lu (Wayne State University, USA)
DOI: 10.4018/978-1-60566-992-2.ch013
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Relational technology has shown to be very useful for scalable Semantic Web data management. Numerous researchers have proposed to use RDBMSs to store and query voluminous RDF data using SQL and RDF query languages. This chapter studies how RDF queries with the so called well-designed graph patterns and nested optional patterns can be efficiently evaluated in an RDBMS. The authors propose to extend relational algebra with a novel relational operator, nested optional join (NOJ), that is more efficient than left outer join in processing nested optional patterns of well-designed graph patterns. They design three efficient algorithms to implement the new operator in relational databases: (1) nested-loops NOJ algorithm, NL-NOJ, (2) sort-merge NOJ algorithm, SM-NOJ, and (3) simple hash NOJ algorithm, SH-NOJ. Using a real life RDF dataset, the authors demonstrate the efficiency of their algorithms by comparing them with the corresponding left outer join implementations and explore the effect of join selectivity on the performance of these algorithms.
Chapter Preview
Top

Introduction

The Semantic Web (Berners-Lee, Hendler, & Lassila, 2001; Shadbolt, Berners-Lee, & Hall, 2006) has recently gained tremendous momentum due to its great potential for providing a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Semantic data is represented in Resource Description Framework (RDF) (W3C, 2004a, 2004b), the standard language for annotating resources on the Web, and queried using the SPARQL (W3C, 2008) query language for RDF that has been recently proposed by the World Wide Web Consortium. RDF data is a collection of statements, called triples, of the form <s,p,o>, where s is a subject, p is a predicate and o is an object, and each triple states the relation between the subject and the object. Such collection of triples can be represented as a directed graph, in which nodes represent subjects and objects, and edges represent predicates connecting from subject nodes to object nodes. SPARQL allows the specification of triple and graph patterns to be matched over RDF graphs.

Increasing amount of RDF data on the Web drives the need for its efficient and effective management. In this light, numerous researchers (Abadi, Marcus, Madden, & Hollenbach, 2007; Agrawal, Somani, & Xu, 2001; Alexaki, Christophides, Karvounarakis, & Plexousakis, 2001; Beckett & Grant, 2003; Broekstra, Kampman, & Harmelen, 2002; Erling, 2001; Harris & Gibbins, 2003; Ma, Su, Pan, Zhang, & Liu, 2004; Narayanan, Kurc, & Saltz, 2006; Pan & Heflin, 2003; Sintek & Kiesel, 2006; Stoffel, Taylor, & Hendler, 1997; Theoharis, Christophides, & Karvounarakis, 2005; Volz, Oberle, Motik, & Staab, 2003; Wilkinson, 2006; Wilkinson, Sayers, Kuno, & Reynolds, 2003) have proposed to use RDBMSs to store and query RDF data using the SQL and SPARQL query languages. One of the most challenging problems in such an approach is the translation of SPARQL queries into queries formulated in relational algebra and SQL.

An important class of graph patterns that are mostly common in RDF queries in practice is the so called well-designed graph patterns (Perez, Arenas, & Gutierrez, 2006a). A well-designed graph pattern gp can contain arbitrary many optional graph patterns that can be nested in each other as in the following equation:gp1OPT(gp2OPT(gp3OPT(⋯(gpn-1OPT(gpn))⋯)) (1) where each gp1, gp2, gp3, ⋯, gpn-1, gpn can be a basic graph pattern (set of triple patterns) or another graph pattern with optional sub-patterns such as (1), OPT indicates an optional graph pattern that follows it, and parenthesis define the order of evaluation. In (1), gp2, gp3, ⋯, gp1, gpn are optional graph patterns, and each gpi, i≥3, is a nested optional graph pattern with respect to gpi-1. By the definition of a well-designed graph pattern, the following property for gp holds: for any sub-pattern (gpx OPT(gpy)) in gp, if a variable ?v occurs both outside this sub-pattern and inside gpy, then ?v also occurs in gpx. The formal semantics of the well-designed graph patterns with nested optional patterns is defined by Perez et al. (2006a) and W3C (2008). Informally, it can be summarized as follows:

Complete Chapter List

Search this Book:
Reset