BDD-Based Combinatorial Keyword Query Processing under a Taxonomy Model

BDD-Based Combinatorial Keyword Query Processing under a Taxonomy Model

Shin-ichi Minato (Hokkaido University, Japan) and Nicolas Spyratos (Université de Paris-Sud, France)
DOI: 10.4018/978-1-61520-851-7.ch002
OnDemand PDF Download:
List Price: $37.50


Digital libraries are one of the key systems for an IT society, and supporting easy access to them is an important technical issue between a human and an intelligent system. Here we consider a publish/subscribe system for digital libraries which continuously evaluates queries over a large repository containing document descriptions. The subscriptions, the query expressions and the document descriptions, all rely on a taxonomy that is a hierarchically organized set of keywords, or terms. The digital library supports insertion, update and removal of a document. Each of these operations is seen as an event that must be notified only to those users whose subscriptions match the document‘s description. In this chapter, the authors present a novel method of processing such keyword queries. Our method is based on Binary Decision Diagram (BDD), an efficient data structure for manipulating large-scale Boolean functions. The authors compile the given keyword queries into a BDD under a taxonomy model. The number of possible keyword sets can be exponentially large, but the compiled BDD gives a compact representation, and enabling a highly efficient matching process. In addition, our method can deal with any Boolean combination of keywords from the taxonomy, while the previous result considered only a conjunctive keyword set. In this chapter, they describe the basic idea of their new method, and then the authors show their preliminary experimental result applying to a document set with large-scale keyword domain under a real-life taxonomy structure.
Chapter Preview


Here we describe the basic framework of subscription-notification service which we consider in this paper.

Subscription-Notification Service in Digital Library

Figure 1 shows the basic processing flow of our system. A document is represented in the digital library repository by a description of its content together with an identifier (say, the document's URI) allowing to access the document's content. A number of documents are incoming to the digital library every day, and the identifier list of recent new documents is notified to each user periodically (daily, weekly, or monthly). Each user can specify a subscription to show the composed keywords that are of interest to that user. Every document has a description with a set of keywords, and if one of the keywords matches to some users' subscriptions, the identifier of the document is added to the notification list. Here, we assume that a user's subscription is not so frequently updated as the notification periods.

Figure 1.

Basic processing flow of subscription-notification service

If we consider a nation-wide digital library system, the number of documents and the number of users become quite huge, and it is an important problem to reduce the computation cost for matching (or filtering) of the documents to each user.

Complete Chapter List

Search this Book: