Strategies for Improving the Efficacy of Fusion Question Answering Systems

Strategies for Improving the Efficacy of Fusion Question Answering Systems

José Antonio Robles-Flores, Gregory Schymik, Julie Smith-David, Robert St. Louis
Copyright: © 2011 |Pages: 18
DOI: 10.4018/jbir.2011010104
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Web search engines typically retrieve a large number of web pages and overload business analysts with irrelevant information. One approach that has been proposed for overcoming some of these problems is automated Question Answering (QA). This paper describes a case study that was designed to determine the efficacy of QA systems for generating answers to original, fusion, list questions (questions that have not previously been asked and answered, questions for which the answer cannot be found on a single web site, and questions for which the answer is a list of items). Results indicate that QA algorithms are not very good at producing complete answer lists and that searchers are not very good at constructing answer lists from snippets. These findings indicate a need for QA research to focus on crowd sourcing answer lists and improving output format.
Article Preview
Top

Introduction

To succeed in today’s business environment, every enterprise must be able to efficiently find information on the web. Although the web is a rich source of information, there are many challenges associated with finding the right information in a timely manner. Web search engines typically retrieve a large number of web pages and overload business analysts with irrelevant information (Chung, Chen, & Nunamaker Jr., 2005).

Ultraseek reported that the average employee spends 3.5 hours a week on unsuccessful searches (Ultraseek, 2006). KMWorld reported that middle managers spend approximately 25% of their time searching for information that is required for the successful completion of their jobs, that the information they find often is wrong, and that 86% of enterprise searchers are dissatisfied with their firms’ search capabilities (KMWorld, 2008). More fine-grained technologies capable of understanding Business Intelligence tasks and representing their results in comprehensible formats are required.

One approach that has been proposed for overcoming some of these challenges is automated Question Answering (QA). The objective of a QA system is to locate, extract, and present the answer to a specific user question that has been expressed in natural language (Roussinov, Fan, & Robles-Flores, 2008). QA systems enable the searcher to pose queries as questions using natural language, and enable the computer to retrieve answers to questions that require the fusion of information from multiple sources. The ability to fuse information from multiple sources allows QA systems to take as input a question like “What are the countries in Central America?” and produce as output a list such as “Guatemala, Belize, El Salvador, Honduras, Nicaragua, Costa Rica, and Panama are countries in Central America.” This is an example of a list question, so called because the answer is a list of items of information.

When dealing with list questions, it is important to differentiate between questions where constructing the answer list requires the fusion of information from multiple web sites (fusion questions), and questions where the answer list can be found on a single web page (non-fusion questions). An example of a non-fusion question is “What are the names of all the teams in the National Football League?” The complete answer to this question is available in many locations, and simply entering “names of all NFL teams” in the Google search bar will provide links to several sites that contain the desired list.

An example of a fusion question is “Which companies manufacture home appliances in the U.S.?” Entering “names of home appliance manufacturers located in the U.S.” in the Google search bar will not and cannot provide links to a single site that contains the desired list, because there is no single site that contains the desired list. Answers to fusion questions require a search engine or service that can query the web for information, parse the returned web pages for the relevant information, and fuse the relevant information into an aggregated answer list. Fusion questions are very common in the business intelligence arena.

Search engines, like Google, Yahoo, and MSN, use many tools to identify relevant snippets for keyword searches; including page rank, term frequency, term proximity, and inverse document frequency. However, these tools are not designed to handle fusion list questions. They treat questions as a “bag of words”. Entering “Who is the largest producer of software?” in the Google search bar, for example, will yield nearly the same results as entering “largest producer software”; and both of these produce unexpected snippets that identify the largest producers of carbon steel, pork, ethanol, and sugar; but do not identify Microsoft, which is the answer the user would expect (see Figure 1). Moreover, even if the correct answer is among the search results, the user still needs to review the snippets in order to locate it.

Figure 1.

Results for question: “Who is the largest producer of software?”

jbir.2011010104.f01

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 14: 1 Issue (2023)
Volume 13: 1 Issue (2022)
Volume 12: 2 Issues (2021)
Volume 11: 2 Issues (2020)
Volume 10: 2 Issues (2019)
Volume 9: 2 Issues (2018)
Volume 8: 2 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing