Overview of Translation Techniques in Cross-Language Question Answering during the Last Decade

Overview of Translation Techniques in Cross-Language Question Answering during the Last Decade

María-Dolores Olvera-Lobo (Unidad Asociada Grupo SCImago, Madrid & University of Granada, Spain) and Juncal Gutiérrez-Artacho (University of Granada, Spain)
DOI: 10.4018/978-1-4666-5888-2.ch466
OnDemand PDF Download:
$30.00
List Price: $37.50

Chapter Preview

Top

Introduction

The development of the Semantic Web requires great economic and human effort. Consequently, it is very useful to create mechanisms and tools that facilitate its expansion. From the standpoint of Information Retrieval, access to the contents of the Semantic Web can be favored by the use of natural language, as it is much simpler and faster for the user to engage in his habitual form of expression.

Information Retrieval

Since the 40s, the problem of information storage and retrieval has attracted increasing attention (Rijsbergen, 1979). Since then the researchers of different disciplines have helped to develop more efficient and sophisticated methods to process, manage and retrieve the information that satisfies the users’ needs. IR is a discipline focused in the problems of information items’ selection from a storage system in order to facilitate retrieval for the users’ needs (Baeza-Yates & Ribeiro-Neto, 1999; Korfhage, 1997; Salton, 1989; Salton & McGill, 1983; Rijsbergen, 1979). Traditionally, IR is understood as a fully automatic process that responds to a user query by examining a collection of documents and returning a sorted document list that should be relevant to the user requirements as expressed in the query. Simply stated, it could be said that retrieval implies finding certain requested information in a storage system or database of information (Meadow, 1992).

An IR system is a system used to store items of information that need to be processed, searched, retrieved, and disseminated to users (Salton & Mc Gill, 1983). The IR systems are information systems that allow an efficient and effective identification of documents from a collection that best meet the information needs of the user, expressed in the search. In just that way, an IR system can be viewed as a black box system that accepts inputs and produces outputs (Harter & Hert, 1997). An optimal IR system recovers all the relevant documents (implying an exhaustive search, i.e. a high recall) and only the relevant documents (implying perfect accuracy, that is to say, a high precision) (Baeza-Yates & Ribeiro-Neto, 1999). Therefore it can be affirmed that the value of an IR system depends on its capacity to quickly and correctly identify useful information, on its ability to reject irrelevant or extraneous items and on the versatility of the methods it employs (Salton & McGill, 1983).

Although, in the latest years, the IR systems have evolved toward a greater affinity with the users, that is to say that they try to adapt the results to the information needs, the traditional models implied restrictions: a) the assumption that users want full-text documents, rather than answers, and that the query will be satisfied with these documents; b) that the process is direct and unidirectional rather than interactive; and finally, c) that the query and document share the same language. The topic of QA systems arises in this context, and such will be commented upon in this chapter.

Key Terms in this Chapter

Cross-Lingual Question Answering (CLQA) Systems: The CLQA systems are a set of coordinated monolingual systems in which each extracts responses from a collection of separate monolingual documents.

Translation: The process of translating words or text from one language into another.

Linguistics: Linguistics is the scientific study of language. There are broadly three aspects to the study, which include language form, language meaning, and language in context.

Question Answering Systems (QA Systems): As an alternative to traditional IR systems they give correct and understandable answers to factual questions – rather than just offering a list of documents related to the search.

Information Retrieval (IR): Fully automatic process that responds to a user query by examining a collection of documents and returning a sorted document list that should be relevant to the user requirements as expressed in the query.

Cross-Lingual Information Retrieval (CLIR): CLIR involves at least two languages in this process.

Automatic Language Translation (AT Systems): Automatic language translation is the use of a computer program to translate input text from one national language to another while maintaining the original document format.

Complete Chapter List

Search this Book:
Reset