Path Expressions in SQL: A User Study on Query Formulation

Path Expressions in SQL: A User Study on Query Formulation

Marko Junkkari (School of Information Sciences, University of Tampere, Tampere, Finland), Johanna Vainio (School of Information Sciences, University of Tampere, Tampere, Finland), Kati Iltanen (School of Information Sciences, University of Tampere, Tampere, Finland), Paavo Arvola (School of Information Sciences, University of Tampere, Tampere, Finland), Heidi Kari (School of Information Sciences, University of Tampere, Tampere, Finland) and Jaana Kekäläinen (School of Information Sciences, University of Tampere, Tampere, Finland)
Copyright: © 2016 |Pages: 22
DOI: 10.4018/JDM.2016070101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This article focuses on testing a path-oriented querying approach to hierarchical data in relational databases. The authors execute a user study to compare the path-oriented approach and traditional SQL from two perspectives: correctness of queries and time spent in querying. They also analyze what kinds of errors are typical in path-oriented SQL. Path-oriented query languages are popular in the context of object-orientation and XML. However, relational databases are the most common paradigm for storing data and SQL is most common for manipulating data. When querying hierarchical data in SQL, the user must specify join conditions explicitly between hierarchy levels. Path-oriented SQL is a new alternative for expressing hierarchical queries in relational databases. In the authors' study, the users spent significantly less time in writing path-oriented SQL queries and made fewer errors in query formulation.
Article Preview

Introduction

Query languages are focused on accessing data and relationships in databases. A database is organized based on the underlying data model. Relational databases are the most common database paradigm and are based on the relational data model (Codd, 1983). In the relational data model, the data is organized in relations (tables) and the data records in different relations are associated with each other by foreign key constraints. SQL (Chamberlin et al., 1974) is the most common query language for relational databases. In it, the user specifies the target of the query, source relations, conditions among relations, and possible join operations among relations. Relational databases are general purpose databases and all relationships are represented similarly.

In many domains, the data has a hierarchical nature, meaning that the data can be viewed through several hierarchical levels (Bernauer, 1996; Burrough and McDonnell, 1998; Niemi and Järvelin, 1995; Motschnig-Pitrik and Kaasböll, 1999; Pazzi, 1999; Urtado and Oussalah, 1998). For example, a physical assembly contains components that in turn contain smaller components, etc. (Junkkari, 2005). In relational databases, the components and the composites they form are represented in separate relations and part-of relationships are represented by foreign key constraints (David, 2003). In SQL, the part-of relationships are manipulated like any association among data. This means that the user must explicitly specify join operations or foreign key constraints between a composition and its components. In large hierarchies, the user must specify a great number of join operations or equal conditions for foreign key constraints. In the research literature, join operations are shown to cause difficulties for users in query writing (Chan, Lu and Wei, 1993; Chan, 2007; Rho and March, 1996; Smelcer, 1995).

Vainio and Junkkari (2014) have recently proposed an alternative way to formulate hierarchical queries, i.e. they embed path expressions into SQL. We call their language PathSQL. Path expressions are widely used in the context of hierarchical data in early data models (Elmasri and Navathe, 1989) and later on in the context of semi-structured data models XML (Bray et al., 1998; Clark and DeRose, 1999; Deutsch et al., 1999) and JSON (Ong et al. 2014). Path-oriented query languages are also used in object-oriented databases (Cattell and Barry, 2000; Cluet, 1998; Alashqur et al., 1989). However, in spite of the popularity of path-oriented query languages, they have not been tested from the perspective of how easy they are to learn and use compared with SQL, the most popular query language. Now, the work by Vainio and Junkkari enables the testing of the path-oriented query approach.

Vainio and Junkkari (2014) demonstrated that queries with path expressions are shorter and more compact. Nevertheless, they do not provide experimental evidence about the usefulness of their approach. We agree with Vainio and Junkkari that path expressions shorten some queries, but this does not ensure that the path-oriented query language would be easier or faster to use. In the present paper, we test traditional SQL and PathSQL languages with users having experience in SQL and minor experience in path-oriented query languages. We compare the time spent in formulating queries in both languages and the number of correct and erroneous queries per user and language. We also analyze what types of errors are common in formulating path-oriented queries.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 28: 4 Issues (2017): 3 Released, 1 Forthcoming
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing