RDF Keyword Search by Query Computation

RDF Keyword Search by Query Computation

Zongmin Ma (Nanjing University of Aeronautics and Astronautics, Nanjing, China), Xiaoqing Lin (Eastern Liaoning University, Dandong, China), Li Yan (College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China) and Zhen Zhao (Bohai University, Jinzhou, China)
Copyright: © 2018 |Pages: 27
DOI: 10.4018/JDM.2018100101

Abstract

Keyword searches based on the keywords-to-SPARQL translation is attracting more attention because of a growing number of excellent SPARQL search engines. Current approaches for keyword search based on the keywords-to-SPARQL translation suffer from returning incomplete answers or wrong answers due to a lack of underlying schema information. To overcome these difficulties, in this article, we propose a new keyword search paradigm by translating keyword queries into SPARQL queries for exploring RDF data. An inter-entity relationship summary with complete schema information is distilled from the RDF data graph for composing SPARQL queries. To avoid potentially wasteful summary graph expansion, we develop a new search prioritization scheme by combining the degree of a vertex with the distance from the original keyword element. Starting from the ordered priority list that is built in advance, we apply the forward path index to faster find the top-k subgraphs, which are relevant to the conjunction of the entering keywords. The experimental results show that our approach is efficient and scalable.
Article Preview
Top

Introduction

RDF (Resource Description Framework) is a standard model for data interchange on the Web. By this model, more and more structured and semi-structured data have been mixed, exposed, and shared across different applications (Klyne, Carroll, and McBride, 2004). As a result, available RDF data rapidly increases. RDF data is a collection of triples, the form (subject, predicate, object). Such a collection of triples can be represented as a directed graph, in which vertices represent subjects or objects, and edges represent predicates that connect subjects and objects. Several important issues of RDF data management are recently reviewed in (Ma, Capretz, Miriam, and Yan, 2016; Wylot, Hauswirth, Philippe, and Sakr, 2018), including RDF data storage techniques, indexing strategies, and query execution mechanisms.

Since SPARQL (Prud’hommeaux and Seaborne, 2013) has been recommended to be the standard query language for RDF data by the World Wide Web Consortium (W3C), there has been a rapid increase in the number of users who want to access RDF data (Yan, Ma, Li, and Cheng, 2017). SPARQL allows the specification of triple and graph patterns to be matched over RDF data graphs (Ma, Jia, Cheng, and AngryK, 2016). And then we can access RDF data correctly and efficiently by SPARQL. But it is still infeasible for non-expert users to master the RDF schema and SPARQL query language. Let us look at an example of SPARQL query for the DBLP dataset shown in Example 1 by the data in Figure 2.

  • Example 1: Find the published articles and proceedings by author “Coles:Drue” for the DBLP dataset.

SELECT ?x ?y
WHERE {?x isIncludedIn ?y.
?x type Article_in_Proceedings.
?y type Proceedings.
?x author “Coles:Drue”.}

It is shown that, to construct a SPARQL query like Example 1, non-expert users are required not only to master SPARQL but also to know the schema information of RDF data such as “isIncludedIn”, “Article in Proceedings” and “Proceedings”. So, it is not easy for non-expert users to construct SPARQL queries. At this point, keyword search has been a popular tool for exploring RDF data for non-expert users (Izquierdo et al., 2018; García, Izquierdo, Menendez, Dartayre, and Marco, 2017). Users only need to enter keywords and then top-k query results answered can be directly returned to the user. Currently, the number of excellent SPARQL search engines is growing rapidly (Broekstra, Kampman, and Harmelen, 2002; Garrison, Stevens, and Jocuns, 2004; Neumann and Weikum, 2008; Neumann and Weikum, 2010). To make it easy for non-expert users to compose SPARQL queries, in this paper, we concentrate on the approach for translating keyword queries into SPARQL queries. As mentioned in (Gkirtzou, Papastefanatos, and Dalamagas, 2015; Ladwig and Tran, 2010; Tran, Wang, Rudolph, and Cimiano, 2009; Lin, Ma, and Yan, 2018; Wen, Jin, and Yuan, 2018), keyword search based on translation has its advantages. It can provide users with a friendly interface for querying RDF data on the one hand and, on the other hand, we can obtain a better query performance by using the existing SPARQL search engines on the condition of guaranteeing the correctness.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 31: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 30: 4 Issues (2019)
Volume 29: 4 Issues (2018)
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing