Article Preview
TopIntroduction
Query languages are focused on accessing data and relationships in databases. A database is organized based on the underlying data model. Relational databases are the most common database paradigm and are based on the relational data model (Codd, 1983). In the relational data model, the data is organized in relations (tables) and the data records in different relations are associated with each other by foreign key constraints. SQL (Chamberlin et al., 1974) is the most common query language for relational databases. In it, the user specifies the target of the query, source relations, conditions among relations, and possible join operations among relations. Relational databases are general purpose databases and all relationships are represented similarly.
In many domains, the data has a hierarchical nature, meaning that the data can be viewed through several hierarchical levels (Bernauer, 1996; Burrough and McDonnell, 1998; Niemi and Järvelin, 1995; Motschnig-Pitrik and Kaasböll, 1999; Pazzi, 1999; Urtado and Oussalah, 1998). For example, a physical assembly contains components that in turn contain smaller components, etc. (Junkkari, 2005). In relational databases, the components and the composites they form are represented in separate relations and part-of relationships are represented by foreign key constraints (David, 2003). In SQL, the part-of relationships are manipulated like any association among data. This means that the user must explicitly specify join operations or foreign key constraints between a composition and its components. In large hierarchies, the user must specify a great number of join operations or equal conditions for foreign key constraints. In the research literature, join operations are shown to cause difficulties for users in query writing (Chan, Lu and Wei, 1993; Chan, 2007; Rho and March, 1996; Smelcer, 1995).
Vainio and Junkkari (2014) have recently proposed an alternative way to formulate hierarchical queries, i.e. they embed path expressions into SQL. We call their language PathSQL. Path expressions are widely used in the context of hierarchical data in early data models (Elmasri and Navathe, 1989) and later on in the context of semi-structured data models XML (Bray et al., 1998; Clark and DeRose, 1999; Deutsch et al., 1999) and JSON (Ong et al. 2014). Path-oriented query languages are also used in object-oriented databases (Cattell and Barry, 2000; Cluet, 1998; Alashqur et al., 1989). However, in spite of the popularity of path-oriented query languages, they have not been tested from the perspective of how easy they are to learn and use compared with SQL, the most popular query language. Now, the work by Vainio and Junkkari enables the testing of the path-oriented query approach.
Vainio and Junkkari (2014) demonstrated that queries with path expressions are shorter and more compact. Nevertheless, they do not provide experimental evidence about the usefulness of their approach. We agree with Vainio and Junkkari that path expressions shorten some queries, but this does not ensure that the path-oriented query language would be easier or faster to use. In the present paper, we test traditional SQL and PathSQL languages with users having experience in SQL and minor experience in path-oriented query languages. We compare the time spent in formulating queries in both languages and the number of correct and erroneous queries per user and language. We also analyze what types of errors are common in formulating path-oriented queries.