Save 10% on All IGI Global Research Books
& OnDemand Individual Chapter & Article DownloadsAvailable exclusively on IGI Global’s Online Bookstore. Offer valid through October 31, 2024

Special Offers
- Save 10% on the IGI Global Online bookstore
  Now through October 31, 2024, save 10% on all IGI Global research books & OnDemand individual chapter & article downloads. IGI Global contributors may stack this discount with their exclusive 50% contributor discount, which is automatically applied when logged into a contributor portal account. Non-contributors may also combine the discount with one other discount, including coupon codes. Not valid on open access processing charges, e-collections, or videos. Discount is not applicable for distributors.
  Explore Books & Chapters
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Automatic Categorization of Web Database Query Results

Xiangfu Meng, Li Yan, Z. M. Ma

Source Title: Advanced Database Query Systems: Techniques, Applications and Technologies

DOI: 10.4018/978-1-60960-475-2.ch001

OnDemand:

(Individual Chapters)

Available

$33.75

List Price: $37.50

Current Special Offers

10% Discount:-$3.75

TOTAL SAVINGS: $3.75

Abstract

Web database queries are often exploratory. The users often find that their queries return too many answers and many of them may be irrelevant. Based on different kinds of user preferences, this chapter proposes a novel categorization approach which consists of two steps. The first step analyzes query history of all users in the system offline and generates a set of clusters over the tuples, where each cluster represents one type of user preference. When a user issues a query, the second step presents to the user a category tree over the clusters generated in the first step such that the user can easily select the subset of query results matching his needs. The problem of constructing a category tree is a cost optimization problem and heuristic algorithms were developed to compute the min-cost categorization. The efficiency and effectiveness of our approach are demonstrated by experimental results.

Chapter Preview

Top

Introduction

As internet becomes ubiquitous, many people are searching their favorite cars, houses, stocks, etc. over the Web databases. However, Web database queries are often exploratory. The users often find that their queries return too many answers, which are commonly referred to as “information overload”. For example, when a user submits a query to MSN House&Home Web site to search for a house located in Seattle with a price between $200,000 and $300,000, 1,256 tuples are returned. Information overload makes it hard for the user to separate the interesting items from the uninteresting ones, and thereby lead to a huge wastage of user’s time and effort. In such a situation, the user would pose a broad query in the beginning to avoid exclusion of potentially interesting results, and then iteratively refine their queries until a few answers matching their preferences are returned. However, this iterative procedure is time-consuming and many users will give up before they reach the final stage.

In order to resolve the problem of “information overload”, two types of solutions have been proposed. The first type categorizes the query results into a category tree (Chakrabarti, Chaudhuri & Hwang, 2004; Chen & Li, 2007), and second type ranks the results (Agrawal, Chaudhuri, Das & Gionis, 2003; Agrawal, Rantzau &Terzi, 2006; Bruno, Gravano & Marian, 2002; Chaudhuri, Das, Hristidis & Weikum, 2004; Das, Hristidis, Kapoor & Sudarshan, 2006). The success of both approaches depends on the utilization of user preferences. But these approaches always assume that all users have the same user preferences, but in real life different users often have different preferences. Let us look at the following example.

Example 1. Consider a real estate searching Web site. Figure 1 and Figure 2 respectively show a fraction of category trees generated by using the methods of Greedy (Chakrabarti, Chaudhuri & Hwang, 2004) and C4.5-Categorization (Chen & Li, 2007) over 214 houses returned by a query with the condition “Price between 250000 and 350000 ∧ City = Seattle”. Each of tree nodes specifies the range or equality conditions on an attribute, and the number in the parentless is the number of tuples satisfying all conditions from the root to the current node. Users can use this tree to select the houses they are interested in.

Figure 1.

Tree generated by the Greedy method

Figure 2.

Tree generated by the C4.5-Categorization method

Consider three users U₁, U₂, and U₃. Assume that U₁ prefers houses with large square, U₂ prefers houses with water views, and U₃ prefers both water views and Burien living area. The Greedy method assumed that all users have the same preferences. As a result, attributes “Livingarea” and “Schooldistrict” are placed at the first two levels of the tree because more users are concerned with “Livingarea” and “Schooldistrict” than other attributes. However, there may be some users (such as U₂ and U₃) who want to first visit the large square and water view houses. Then they have to visit many nodes if they go along with the tree built in Figure 1. Considering the diversity of user preferences and the cost of both visiting intermediate nodes and leaf nodes, the C4.5-Categorization method took advantage of C4.5 algorithm to create the navigational tree. But the created category tree (Figure 2) has two drawbacks: (i) the tuples under the intermediate nodes cannot be explored by the users, i.e., users can only access the tuples under the leaf nodes but cannot examine the tuples in the intermediate nodes; (ii) the cost of visiting the tuples of intermediate node is not considered if the user choose to explore the tuples of intermediate node.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Automatic Categorization of Web Database Query Results

Abstract

Introduction

Complete Chapter List