Finding Answers to Questions, in Text Collections or Web, in Open Domain or Specialty Domains

Finding Answers to Questions, in Text Collections or Web, in Open Domain or Specialty Domains

Brigitte Grau (LIMSI-CNRS and ENSIIE, France)
DOI: 10.4018/978-1-4666-0330-1.ch015
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This chapter is dedicated to factual question answering, i.e., extracting precise and exact answers to question given in natural language from texts. A question in natural language gives more information than a bag of word query (i.e., a query made of a list of words), and provides clues for finding precise answers. The author first focuses on the presentation of the underlying problems mainly due to the existence of linguistic variations between questions and their answerable pieces of texts for selecting relevant passages and extracting reliable answers. The author first presents how to answer factual question in open domain. The author also presents answering questions in specialty domain as it requires dealing with semi-structured knowledge and specialized terminologies, and can lead to different applications, as information management in corporations for example. Searching answers on the Web constitutes another application frame and introduces specificities linked to Web redundancy or collaborative usage. Besides, the Web is also multilingual, and a challenging problem consists in searching answers in target language documents other than the source language of the question. For all these topics, this chapter presents main approaches and the remaining problems.
Chapter Preview
Top

Introduction

The large number of documents currently on the Web, but also in intranets, makes it necessary to provide users intelligent assistant tools to help them finding the specific information they are searching for. Relevant information at the right time is able to help solving a particular task. Thus, purpose is to be able to access the content of texts, and not only give access to documents. The document is the means to reach the knowledge it contains, not the goal of the research. Question-answering systems address this question and their purpose is to provide a user the information she is looking for instead of documents she will have to read to find the required answer.

This topic arose since the early work in Artificial Intelligence with systems dedicated for questioning knowledge base in natural language, as BASEBALL in 1963 (Green et al., 1986) LUNAR in 1973 and LADDER in 1977 (Barr et al., 1981) for a brief description of these systems). Afterward, Lehnert with her system QUALM (Lehnert, 1977) has posed the problem of the semantic modeling of questions in order to associate them different strategies to find answers.

However, these works were based largely on manual modeling of knowledge and remained dedicated to limited domains. Thus, they have not led to realistic applications and the research for precise answers turn towards the development of database interrogation interfaces.

It is only recently that the problem has re-emerged at TREC, in 1999, with the first evaluation of question-answering systems in open domain dedicated to find answers to factual questions in texts.

As in querying database, factual questions wait for short answers that give precise information. Factual questions are those questions that ask for a short and concise answer about precise facts, as for example a person name as in “What is the name of the managing director of Apricot Computer?” or a date as in “When is Bastille Day?. However, this time, topics are not limited and knowledge is not structured previously, since these are the texts that are its repositories. Finding answers requires analyzing texts and this is made possible thanks to mature natural language processing tools. The wide availability of texts in numeric format has allowed to model and evaluate linguistic processes and led to the distribution of tools widely applicable, such as word syntactic category taggers (also called part-of-speech (POS) taggers) or robust syntactic parsers. Word syntactic category taggers is the process of identifying which word is used in a text, and which is its grammatical category, as noun, verb, adjective. Syntactic parsers realize grammatical analysis of sentences, highlighting the different phrases (noun phrases, verbal phrases, etc.) and their relations, as subject, direct object, etc. The dissemination of knowledge sources, such as lexicons, thesauri and ontology also enables the realization of advanced text processing.

Thus, the problem of finding answers to questions is now posed differently: it consists in extracting a piece of information from a text. The texts themselves are the sources of knowledge and can be structured and enriched by automatic processes. As first systems have found applications in natural language interface for querying databases by non expert users, QA systems are an answer each time there is a great amount of documents to interrogate for precise information needs, even in professional sectors: business analysis, technologic scouting, journalistic documentation, biography, etc.

Since their beginnings in TREC, question-answering systems have known a great interest from the community, either in Information Retrieval or in Natural Language Processing. Following TREC, the task was introduced in other conferences in IR evaluations: CLEF 2 in 2003, for European languages and multi-lingual approach, NTCIR 3 for Asian languages, in 2003 too.

These researches have led to the realization of systems which differ from document retrieval systems (cf. Chapter ??).

Complete Chapter List

Search this Book:
Reset