Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Scalable XML Filtering for Content Subscriptions

Ryan Choi, Raymond Wong

Source Title: Theoretical and Practical Advances in Information Systems Development: Emerging Trends and Approaches

DOI: 10.4018/978-1-60960-521-6.ch007

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Over the past few years, there have been an increasing number of Web applications that exchange various types of data on the Internet. In this article, we propose a technique for building efficient and scalable XML publish/subscribe applications. In particular, we look at the problem of processing streaming XML data efficiently against a large number of branch XPath queries. To improve the performance of XML data processing, the branch queries that have similar query characteristics are grouped, and common paths between the queries in the same group are identified. Then, these groups of queries are processed against an XML schema to validate query structures. After performing structural matching of queries, the queries are organized in a way that multiple queries can be evaluated simultaneously in the post-processing phase. In the post-processing phase, join operations are executed in a pipeline fashion, and intermediate join results are shared amongst the queries in the same group. The benefit of this approach is that, the total number of join operations performed in the post-processing phase is significantly reduced. In addition, we also present how to efficiently return all matching elements for each matching branch query. Experiments show that our proposal is efficient and scalable compared to previous works.

Chapter Preview

Top

Introduction

With the development of the World Wide Web (WWW), a huge number of Web-based applications have been developed over the past few years. Web 2.0 (O’Reilly, 2005) goes one step further than traditional web applications in a way that, they provide personalized services, as well as letting users create, publish, and share contents amongst other users who share similar interests. Recent works include online finance (Nah, Siau, & Tian, 2005), education (Siau, Sheng, & Nah, 2006), government (Siau & Long, 2006), healthcare (Siau & Shen, 2006), and firewall (Benedikt, Jeffrey, & Ley-Wild, 2008) applications. Another important difference between Web 2.0 and traditional web applications is that, users of Web 2.0 applications subscribe to the contents that they are interested in, and the web contents are delivered directly to users. Furthermore, users with similar interests can share their subscriptions to quickly discover other related contents. This is quite different from traditional web applications, where users obtain contents of interest by visiting web sites, following links, etc.

As a motivating example of a Web 2.0 application, let us consider an online-based news feed application that delivers latest news articles to users. A unique characteristic of this application is that, it receives various types of streaming data from multiple data publishers, selects data of interest, and forwards the selected data to various groups of users who are interested in receiving such data. One problem associated with this application is that, since each data publisher is designed and implemented differently, the data format from one data publisher is usually incompatible with the formats from other peer data publishers. Having a unique format for each data publisher causes problems when the data are collected and processed by a single application. XML (Bray, Paoli, Sperberg-McQueen, Maler, & Yergeau, 2008) solves this problem by providing a way to represent any data from different publishers in a universal format, such that the data can be collected and processed by a single application. Moreover, while data are converted to XML, irregular data, which may have been represented by multiple data and metadata tables in relational database systems, can be intuitively and logically represented. Let us now assume that all data publishers use XML to represent their data.

While information searching and retrieval are well studied areas in research communities, the problems in this context are different from the traditional search problems in many ways. In this context, new data continuously arrive to our news feed application, and the application must select or filter the right data according to user subscriptions. In our application, we represent user subscriptions in XPath (Clark & DeRose, 1999) queries. Then, the “query results” in this context are a set of matching XPath queries for each streaming XML document. The use of XML and XPath to implement a filtering mechanism has a number of advantages over similar but non XML-based approaches. First, more expressive user subscriptions can be supported. Unlike keyword-based subscriptions, which simply report matching sets of keywords for each document, users can also utilize structural information implicitly integrated to XML documents to precisely specify the exact content that they wish to receive. For example, while it is logical to express an XPath subscription to find news articles in a financial section that talk about the impact of US mortgage crisis, such subscription is not trivial in keyword-based subscriptions. Second, there are more opportunities to optimize a filtering processor. For example, since subscriptions are written in XPath queries, it is possible to group similar queries and process groups of queries simultaneously. Third, the use of XML is perfect to model the increasing amount of semi-structured data on the Web.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Scalable XML Filtering for Content Subscriptions

Abstract

Introduction

Complete Chapter List