XML Stream Query Processing: Current Technologies and Open Challenges

XML Stream Query Processing: Current Technologies and Open Challenges

Mingzhu Wei (Worcester Polytechnic Institute, USA), Ming Li (Worcester Polytechnic Institute, USA), Elke A. Rundensteiner (Worcester Polytechnic Institute, USA), Murali Mani (Worcester Polytechnic Institute, USA) and Hong Su (Oracle Cooperation, USA)
DOI: 10.4018/978-1-60566-308-1.ch005
OnDemand PDF Download:
$37.50

Abstract

Stream applications bring the challenge of efficiently processing queries on sequentially accessible XML data streams. In this chapter, the authors study the current techniques and open challenges of XML stream processing. Firstly, they examine the input data semantics in XML streams and introduce the state-of-the-art of XML stream processing. Secondly, they compare and contrast the automatonbased and algebra-based techniques used in XML stream query execution. Thirdly, they study different optimization strategies that have been investigated for XML stream processing – in particular, they discuss cost-based optimization as well as schema-based optimization strategies. Lastly but not least, the authors list several key open challenges in XML stream processing.
Chapter Preview
Top

Introduction

In our increasingly fast-paced digital world all activities of humans and surrounding environments are being tagged and thus digitally accessible in real time. This opens up the novel opportunity to develop a variety of applications that monitor and make use of such data streams, typically stock, traffic and network activities (Babcock et al., 2002). Many projects, both in industry and academia, have recently sprung up to tackle newly emerging challenges related to stream processing. On the academic side, projects include Aurora (Abadi et al., 2003), Borealis (Abadi et al., 2005), STREAM (Babu & Widom, 2001), Niagara (Chen et al., 2002), TelegraphCQ (Chandrasekaran et al., 2003), and CAPE (Rundensteiner et al., 2004). On the industrial side, existing major players in database industry such as Oracle (Witkowski et al., 2007) and IBM (Amini et al., 2006) have embarked on stream projects and new startup companies have also emerged (Streambase, 2008; Coral8, 2008).

While most of these activities initially focused on simple relational data, it is apparent that XML is an established format and has been widely accepted as the standard data representation for exchanging information on the internet. Due to the proliferation of XML data in web services (Carey et al., 2002), there is also a surge in XML stream applications (Koch et al., 2004; Florescu et al., 2003; Diao & Franklin, 2003; Bose et al., 2003; Russell et al., 2003; Ludascher et al., 2002; Peng & Chawathe, 2003). For instance, a message broker routes the XML messages to interested parties (Gupta & Suciu, 2003). In addition, message brokers can also perform message restructuring or backups. For example, in an on-line order handling system (Carey et al., 2002), suppliers can register their available products with the broker. The broker will then match each incoming purchase order with the subscription and forward it to the corresponding suppliers, possibly in a restructured format at the request of the suppliers. Other typical applications include XML packet routing (Snoeren & Conkey, 2001), selective dissemination of information (Altinel & Franklin, 2000), and notification systems (Nguyen et al., 2001).

XML streams are often handled as a sequence of primitive tokens, such as a start tag, an end tag or a PCDATA item. To perform query evaluation over such on-the-fly XML token streams, most systems (Diao et al., 2003; Gupta & Suciu, 2003; Ludascher et al., 2002; Peng & Chawathe, 2003) propose to use automata to retrieve patterns from XML token streams. However, although automata is a suitable technique for matching expressions, how to improve and extend automata functionality in order to efficiently answer queries over XML streams has been a topic of active debate by the XML community. Further, one distinguishing feature of pattern retrieval on XML streams is that it relies solely on the token-by-token sequential traversal. It is not possible to jump to a certain portion of the stream (analogous to sequential access on magnetic tapes). Thus, the traditional index-based technologies cannot be applied for effective query optimization. In static XML processing, cost-based and schema-based optimization techniques are widely used. How to perform such optimization and other optimization techniques in the streaming XML context is a major challenge, and is thus one of the topics of this chapter.

Complete Chapter List

Search this Book:
Reset
Table of Contents
Foreword
Ernesto Damiani
Preface
Eric Pardede
Acknowledgment
Eric Pardede
Chapter 1
Mary Ann Malloy, Irena Mlynkova
As XML technologies have become a standard for data representation, it is inevitable to propose and implement efficient techniques for managing XML... Sample PDF
Closing the Gap Between XML and Relational Database Technologies: State-of-the-Practice, State-of-the-Art and Future Directions
$37.50
Chapter 2
Mirella M. Moro, Lipyeow Lim, Yuan-Chi Chang
It is well known that XML has been widely adopted for its flexible and self-describing nature. However, relational data will continue to co-exist... Sample PDF
Challenges on Modeling Hybrid XML-Relational Databases
$37.50
Chapter 3
Vassiliki Koutsonikola, Athena Vakali
Nowadays, XML has become the standard for representing and exchanging data over the Web and several approaches have been proposed for efficiently... Sample PDF
XML and LDAP Integration: Issues and Trends
$37.50
Chapter 4
Giovanna Guerrini, Marco Mesiti
The large dynamicity of XML documents on the Web has created the need to adequately support structural changes and to account for the possibility of... Sample PDF
XML Schema Evolution and Versioning: Current Approaches and Future Trends
$37.50
Chapter 5
Mingzhu Wei, Ming Li, Elke A. Rundensteiner, Murali Mani, Hong Su
Stream applications bring the challenge of efficiently processing queries on sequentially accessible XML data streams. In this chapter, the authors... Sample PDF
XML Stream Query Processing: Current Technologies and Open Challenges
$37.50
Chapter 6
Sven Groppe, Jinghua Groppe, Christoph Reinke, Nils Hoeller, Volker Linnemann
The widespread usage of XML in the last few years has resulted in the development of a number of XML query languages like XSLT or the later... Sample PDF
XSLT: Common Issues with XQuery and Special Issues of XSLT
$37.50
Chapter 7
Mirella M. Moro, Zografoula Vagena, Vassilis J. Tsotras
Content-based routing is a form of data delivery whereby the flow of messages is driven by their content rather than the IP address of their... Sample PDF
Recent Advances and Challenges in XML Document Routing
$37.50
Chapter 8
Philippe Poulard
XML engines are usually designed to solve a single class of problems: transformations of XML structures, validations of XML instances, Web... Sample PDF
Native XML Programming: Make Your Tags Active
$37.50
Chapter 9
Stéphane Bressan, Wee Hyong Tok, Xue Zhao
Since XML technologies have become a standard for data representation, a great amount of discussion has been generated by the persisting open issues... Sample PDF
Continuous and Progressive XML Query Processing and its Applications
$37.50
Chapter 10
Fabio Grandi, Federica Mandreoli, Riccardo Martoglia
In several application fields including legal and medical domains, XML documents are “versioned” along different dimensions of interest, whose... Sample PDF
Issues in Personalized Access to Multi-Version XML Documents
$37.50
Chapter 11
Tran Khanh Dang
In an outsourced XML database service model, organizations rely upon the premises of external service providers for the storage and retrieval... Sample PDF
Security Issues in Outsourced XML Databases
$37.50
Chapter 12
Marco Mesiti, Ernesto Jiménez Ruiz, Ismael Sanz, Rafael Berlanga Llavori, Giorgio Valentini, Paolo Perlasca, David Manset
There is a proliferation of research and industrial organizations that produce sources of huge amounts of biological data issuing from... Sample PDF
Data Integration Issues and Opportunities in Biological XML Data Management
$37.50
Chapter 13
Doulkifli Boukraa, Riadh Ben Messaoud, Omar Boussaid
Current data warehouses deal for the most part with numerical data. However, decision makers need to analyze data presented in all formats which one... Sample PDF
Modeling XML Warehouses for Complex Data: The New Issues
$37.50
Chapter 14
Irena Mlynkova
Since XML technologies have become a standard for data representation, numerous methods for processing XML data emerge every day. Consequently, it... Sample PDF
XML Benchmarking: The State of the Art and Possible Enhancements
$37.50
About the Contributors