Efficient and Effective Aggregate Keyword Search on Relational Databases

Efficient and Effective Aggregate Keyword Search on Relational Databases

Luping Li (Baidu, Inc., Beijing, China), Stephen Petschulat (SAP Business Objects, Coquitlam, BC, Canada), Guanting Tang (School of Computing Science, Simon Fraser University, Burnaby, BC, Canada), Jian Pei (School of Computing Science, Simon Fraser University, Burnaby, BC, Canada) and Wo-Shun Luk (School of Computing Science, Simon Fraser University, Burnaby, BC, Canada)
Copyright: © 2012 |Pages: 41
DOI: 10.4018/jdwm.2012100103
OnDemand PDF Download:
$37.50

Abstract

Keyword search on relational databases is useful and popular for many users without technical background. Recently, aggregate keyword search on relational databases was proposed and has attracted interest. However, two important problems still remain. First, aggregate keyword search can be very costly on large relational databases, partly due to the lack of efficient indexes. Second, finding the top-k answers to an aggregate keyword query has not been addressed systematically, including both the ranking model and the efficient evaluation methods. In this paper, the authors tackle these two problems to improve the efficiency and effectiveness of aggregate keyword search on large relational databases. They designed indexes efficient in both size and construction time. The authors propose a general ranking model and an efficient ranking algorithm. They also report a systematic performance evaluation using real data sets.
Article Preview

Introduction

More and more relational databases contain textual data and thus keyword search on relational databases becomes popular. Aggregate keyword search (Zhou & Pei, 2009) was recently proposed on relational databases: given a set of keywords, find a set of aggregates such that each aggregate is a group-by covering all query keywords.

Aggregate keyword search on relational databases has attracted a lot of attention (Chen, Wang, & Liu, 2011; Ding, Yu, Zhao, Lin, Han, & Zhai, 2010; Ding, Zhao, Lin, Han, & Zhai, 2010; Draper & Smith, 1981; Koren, Zhang, & Liu, 2008; Li, Xu, Lu, & Qian, 2010; Zhou & Pei, 2009). A few critical challenges have been identified, such as how to develop efficient approaches for finding all minimal group-bys (Zhou & Pei, 2009) or top-k relevant cells (Ding, et al., 2010) to a user given keyword query. To motivate, we revisit the example in Zhou and Pei (2009).

  • Example 1 (Motivation) (Zhou & Pei, 2009):Table 1shows a database of tourism event calendar. Such an event calendar is popular in many tourism web sites and travel agents’ databases (or data warehouses). To keep our discussion simple, in the field of description, a set of keywords are extracted. In general, this field can store text description of events.

Table 1.
A table of tourism events
    Month    State    City    Event    Description
    December    Texas    Houston    Space Shuttle Experience    Rocket, Supersonic, Jet
    December    Texas    Dallas    Cowboy’s Dream Run    Motorcycle, Culture, Beer
    December    Texas    Austin    SPAM Museum Party    Classical American Hormel Foods
    November    Arizona    Phoenix    Cowboy Culture Show    Rock Music

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing