Designing Document SQL (DSQL): An Accessible yet Comprehensive Ad-Hoc Querying Frontend for Query

Designing Document SQL (DSQL): An Accessible yet Comprehensive Ad-Hoc Querying Frontend for Query

Arijit Sengupta (Wright State University, USA) and V. Ramesh (Indiana University, USA)
Copyright: © 2009 |Pages: 28
DOI: 10.4018/jdm.2009062502

Abstract

This article presents DSQL, a conservative extension of SQL, as an ad-hoc query language for XML. The development of DSQL follows the theoretical foundations of first order logic, and uses common query semantics already accepted for SQL. DSQL represents a core subset of XQuery that lends well to optimization techniques, while at the same time allows easy integration into current databases and applications that useSQL. The intent of DSQL is not to replace XQuery, the current W3C recommended XML query language, but to serve as an ad-hoc querying frontend to XQuery. Further, the authors present proofs for important query language properties such as complexity and closure. An empirical study comparing DSQL and XQuery for the purpose of ad-hoc querying demonstrates that users perform better with DSQL for both flat and tree structures, in terms of both accuracy and efficiency.
Article Preview

Introduction

XQuery, the query language for XML, originally proposed as early as 2001, currently a 2007 W3C recommendation (Don Chamberlin, Clark, et al., 2001), was ratified as a candidate W3C recommendation late 2005, and became an official W3C recommendation in Feb 2007 (Boag, et al., 2007). With the increase in the popularity of XML as the next generation of documentation representation language for the hyped Web 2.0 (O'Reilly, 2005), the need for a standard way of retrieving information from XML documents was considered a critical issue, which resulted in the design and eventual recommendation of XQuery. XQuery came from a marriage of two directions of querying: (i) pattern-based languages based on the tree structure of XML documents such as XPath (Clark & DeRose, 1999) and XQL (Robie, Lapp, & Schach, 1998), and (ii) a more logic-oriented approach with conditions and output specifications such as XML-QL (Deutsch, Fernandez, Florescu, Levy, & Suciu, 1998). Although there are some attempts towards including XML querying support in SQL, including an effort by the International Standards Organization - SQL-03 (ANSI/ISO, 2003) from the International Standards Organization (ISO), a common decision among XML query language designers was to create a completely new language for the purpose of querying XML data. XQuery is defined as a “full declarative functional programming language, with support for arbitrary levels of recursion and arbitrarily large memory usage”. This is a direction away from previous query language research which tended to ensure the complexity of SQL stayed within reasonable complexity bounds. The problem is that having a full programming language may not be suitable for ad-hoc querying, which will explain why after 7 years of the development of XQuery, it is still not close to the level of popularity as an ad-hoc query language for XML.

Typically, a declarative (as opposed to a procedural) language is one that can specify an expression by declaring the structure and conditions of the intended result, instead of explicitly providing the steps necessary to obtain those results. For example, an SQL query need only specify the output attributes, the input relations, and properties of the output. The advantage of a declarative language is that the query engine can decide what steps to take to generate the output, by considering all optimization possibilities. Some characteristics of XML schema make it possible to write queries using a declarative language. Although XML documents have a complex hierarchical structure, the strong presence of meta-data in XML documents makes it fairly intuitive to write declarative queries based purely on logical combinations of the properties of the intended results. Declarative query languages where the primary focus is on the properties of the result, rather than the process of extracting the result itself, are very suitable for structured data, because they allow the possibility of letting the system optimize the queries instead of relying on the users’ capabilities for writing an efficient query. We present a declarative query language, Document SQL (DSQL) that has the same look and feel as SQL and was designed by updating the semantics of SQL operations in the structured document domain. At the same time, DSQL was designed such that all queries written in it have equivalent counterparts in XQuery. Thus, by using such a language, users can take advantage of their existing SQL knowledge when writing ad-hoc queries without losing the expressive power of XQuery. In addition to describing the syntax and semantics of the language we also present the results of an experiment that investigates whether a language like DSQL make it possible for users to write more accurate and efficient queries than XQuery.

The rest of the article is organized as follows. Section 2 reviews current research in this area. Section 3 illustrates a data model for representing XML documents. Section 4 introduces the DSQL query language, and Section 5 provides a comparison between DSQL and XQuery. We describe a study comparing DSQL with XQuery in Section 6, discuss the findings of this study in Section 7 and provide some concluding remarks in Section 8.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 30: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 29: 4 Issues (2018): 3 Released, 1 Forthcoming
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing