Advanced Question-Answering and Discourse Semantics

Advanced Question-Answering and Discourse Semantics

Patrick Saint-Dizier (IRIT-CNRS, France)
DOI: 10.4018/978-1-4666-2169-5.ch006
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this chapter, the authors develop the paradigm of advanced question-answering that includes how-to, why, evaluative, comparative, and opinion questions. They show the different parameters at stake in answer production, involving several aspects of cooperativity. These types of questions require quite a lot of discourse semantics analysis and domain knowledge. The second part of this chapter is devoted to a short presentation of those text semantics aspects relevant to answering questions. The last part of this chapter introduces , a platform the authors have developed for discourse semantics analysis that they use for answering complex questions, in particular how-to and opinion questions.
Chapter Preview
Top

Introduction

Question-answering is not a new area of research. Question answering underwent through three major steps, motivated by major NLP technological progress. The first phase, starting as early as 1961, is characterized by small prototypes, running on very restricted domains and interacting with databases. In this first generation of question answering systems fall, for example, Baseball, LUNAR, QUALM, and STUDENT. A second generation emerged with the ARDA (AQAINT program) and TREC-QA capable of managing very large volumes of data, working on either open or closed domains, answering factoid or definition questions. A number of commercial products were developed, among which: START at MIT, Askjeeves, AnswerBus, QuASM, and IONAUT. More recently, road maps (Burger, et al., 2001) focus on the needs of deeper modes of language understanding and elaborated reasoning schemas to properly answer questions. Over the last decade, many international question answering contests have been held, such as TREC (Voorhees, 2001), CLEF, and NTCIR. Question answering is investigated over several languages and within a multilingual perspective. Thus far, eleven languages have been tested on monolingual or cross-lingual question answering tasks. A new trend emerges around the notions of multi-media question answering. This will not be further developed here since this is somewhat outside the scope of this chapter. Dialogue for QA and user profiling also become major challenges that make QA more realistic (see dedicated chapter in this volume).

Question-Answering (QA) involves a large diversity of techniques and resources that depend on the type of system to realize (e.g. domain dependent or not) and on a number of requirements. QA can involve shallow techniques to retrieve passages in documents as well as deep, linguistic-based natural language processing techniques. Statistical QA is the major trend (i.e. answer retrieval is based on statistical algorithms), but knowledge-based approaches are now emerging to resolve complex situations. The main issues are: question analysis (type of the expected answer, focus and constraints on the expected answer), answer extraction from various kinds of documents (including analysis of the best match and answer reliability evaluation), and answer formulation (which is often very basic). Besides these three main aspects, let us note: the taking into account the question context, interactive and multimedia QA, and multilingual QA.

Questions are usually categorized according to the type of answer they induce. Various categorizations have been elaborated by Lehnert (1978), Rilo et al. (1994), Hermjakob (2001), and Li et al. (2002). For example, in Lehnert (1978), question categories are, among others: goal, cause, enablement, verification, instrumental, expectation, judgmental, or quantificational. These types are highly conceptual and difficult to indentify in questions. The QALC system (Ferret, et al., 2001) introduced 17 types of questions among which person, organization, quantity, and place. These are closer to semantic types, which characterize here the semantic type of the expected information. We then observed a proliferation of typologies: while Lasso (Moldovan, et al., 1999), and Webclopedia (Hermjakob, et al., 2002) postulated respectively 25 and 70 types which are somewhat heterogeneous, some systems use the whole WordNet concept hierarchy, leading to more than 8000 types.

However, a major distinction can be made between questions that basically induce factoid responses, i.e. a short piece of information which can be directly extracted from a text (e.g. dates, costs, names) and questions where the response is a well-formed text portion (or a set of portions), e.g. a procedure to follow to realize something or the causes of an event. Answering these questions requires more complex language and reasoning treatments, possibly radically different approaches and technologies. These types of questions fall into the paradigm of advanced question-answering, not to be confused with complex question answering, which includes, among others, questions composed of several layers such as hypothesis, given data, pre-requisites or sub-questions.

Complete Chapter List

Search this Book:
Reset