Article Preview
Top1. Introduction
The traditional Web provides an enormous amount of information which is generally unstructured. This motivates the Semantic Web community to introduce more structured and meaningful data on the Web, also the tools and techniques for publishing and retrieving this data (Garzon, 2020). These efforts result in the concept of Linked Data. The Linked Data is different from the typical unstructured Web of documents hence it is also known as ‘Web of Data’ (WoD). The WoD is a novel collection of structured data which is distributed across the Web, defined using standard Semantic Web technologies i.e. Resource Description Framework 1 (RDF) and SPARQL Protocol and RDF Query Language2 (SPARQL) and published under Linked Data principles3 (Umbrich, 2015).
To utilize the full potential of WoD, an initiative is taken to provide an unrestricted access to this data known as “Linked Open Data” (LOD) cloud4. The link traversal strategy allows users to query LOD cloud live. Unlike traditional Web, the use of centralized approaches for searching the contents based on optimized indices is not a viable solution for data over LOD cloud. As this data is dynamic, so the copied data dumps can become out-of-date and stale query results can be obtained. Hence more emphasis is given to query this data live to cater its dynamicity. The link traversal approach relies on Linked Data principles and can fetch fresh results on-the-fly for the SPARQL queries often with slower response times. It employs a recursive URI lookup mechanism to traverse Linked Data sources using the follow-your-nose approach (i.e., dereferencing HTTP links) (Hartig, 2011; Hartig et al., 2009). Nevertheless, this technique has a shortcoming when it comes to answering certain query patterns where subject is unbound (e.g. ?S rdf:type:Class) (Scheglmann & Scherp, 2014) and where there is a foreign URI and/or literal at the object position. In the case of above-mentioned query forms, the link traversal for SPARQL queries returns empty results if no prior information of data sources is available also known as ‘zero-knowledge link traversal’. The Linked Data sources are primarily subject-centric and if object is a foreign URI then the original data sources containing query results are not reachable. Furthermore, in case of a literal at the object position will also not allow the URI lookup process to be initiated as literals are non-dereferenceable. So, there is a need to obtain some preliminary information about the data sources for such queries to be answered as zero-knowledge link traversal is not significant.
Mostly, the link traversal approach produces empty results for the SPARQL queries where the data sources mentioned in the object URI do not contain the knowledge of the incoming properties given the subject is unbound. These incoming properties are also known as ‘in-links’ or ‘backlinks’. They can be identified in an RDF data source with triples containing foreign URIs. The process of finding and maintaining of such in-links/backlinks is called ‘backlinking’ (Stefanidakis & Papadakis, 2011). The knowledge of backlinks is generally not provided in the Linked Data sources. The backlinks could drastically improve the performance of the Linked Data crawlers and query engines. However, the identification and maintenance of the in-links or backlinks is quite time consuming and cumbersome task as proved with the help of experimentation (Bai et al., 2018).