Hybrid Query Execution on Linked Data With Complete Results

Hybrid Query Execution on Linked Data With Complete Results

Samita Bai, Shakeel A. Khoja
Copyright: © 2021 |Pages: 25
DOI: 10.4018/IJSWIS.2021010102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The link traversal strategies to query Linked Data over WWW can retrieve up-to-date results using a recursive URI lookup process in real-time. The downside of this approach comes with the query patterns having subject unbound (i.e. ?S rdf:type:Class). Such queries fail to start up the traversal process as the RDF pages are subject-centric in nature. Thus, zero-knowledge link traversal leads to the empty query results for these queries. In this paper, the authors analyze a large corpus of real-world SPARQL query logs and identify the Most Frequent Predicates (MFPs) occurring in these queries. The knowledge of these MFPs helps in finding and indexing a limited number of triples from the original data set. Additionally, the authors propose a Hybrid Query Execution (HQE) approach to execute the queries over this index for initial data source selection followed by link traversal process to fetch complete results. The evaluation of HQE on the latest real data benchmarks reveals that it retrieves at least five times more results than the existing approaches.
Article Preview
Top

1. Introduction

The traditional Web provides an enormous amount of information which is generally unstructured. This motivates the Semantic Web community to introduce more structured and meaningful data on the Web, also the tools and techniques for publishing and retrieving this data (Garzon, 2020). These efforts result in the concept of Linked Data. The Linked Data is different from the typical unstructured Web of documents hence it is also known as ‘Web of Data’ (WoD). The WoD is a novel collection of structured data which is distributed across the Web, defined using standard Semantic Web technologies i.e. Resource Description Framework 1 (RDF) and SPARQL Protocol and RDF Query Language2 (SPARQL) and published under Linked Data principles3 (Umbrich, 2015).

To utilize the full potential of WoD, an initiative is taken to provide an unrestricted access to this data known as “Linked Open Data” (LOD) cloud4. The link traversal strategy allows users to query LOD cloud live. Unlike traditional Web, the use of centralized approaches for searching the contents based on optimized indices is not a viable solution for data over LOD cloud. As this data is dynamic, so the copied data dumps can become out-of-date and stale query results can be obtained. Hence more emphasis is given to query this data live to cater its dynamicity. The link traversal approach relies on Linked Data principles and can fetch fresh results on-the-fly for the SPARQL queries often with slower response times. It employs a recursive URI lookup mechanism to traverse Linked Data sources using the follow-your-nose approach (i.e., dereferencing HTTP links) (Hartig, 2011; Hartig et al., 2009). Nevertheless, this technique has a shortcoming when it comes to answering certain query patterns where subject is unbound (e.g. ?S rdf:type:Class) (Scheglmann & Scherp, 2014) and where there is a foreign URI and/or literal at the object position. In the case of above-mentioned query forms, the link traversal for SPARQL queries returns empty results if no prior information of data sources is available also known as ‘zero-knowledge link traversal’. The Linked Data sources are primarily subject-centric and if object is a foreign URI then the original data sources containing query results are not reachable. Furthermore, in case of a literal at the object position will also not allow the URI lookup process to be initiated as literals are non-dereferenceable. So, there is a need to obtain some preliminary information about the data sources for such queries to be answered as zero-knowledge link traversal is not significant.

Mostly, the link traversal approach produces empty results for the SPARQL queries where the data sources mentioned in the object URI do not contain the knowledge of the incoming properties given the subject is unbound. These incoming properties are also known as ‘in-links’ or ‘backlinks’. They can be identified in an RDF data source with triples containing foreign URIs. The process of finding and maintaining of such in-links/backlinks is called ‘backlinking’ (Stefanidakis & Papadakis, 2011). The knowledge of backlinks is generally not provided in the Linked Data sources. The backlinks could drastically improve the performance of the Linked Data crawlers and query engines. However, the identification and maintenance of the in-links or backlinks is quite time consuming and cumbersome task as proved with the help of experimentation (Bai et al., 2018).

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing