Anaphora Resolution: Analysing the Impact on Mean Average Precision and Detecting Limitations of Automated Approaches

Anaphora Resolution: Analysing the Impact on Mean Average Precision and Detecting Limitations of Automated Approaches

Daniel Gros (Heinrich Heine University Düsseldorf, Düsseldorf, Germany), Tim Habermann (Heinrich Heine University Düsseldorf, Düsseldorf, Germany), Giulia Kirstein (Heinrich Heine University Düsseldorf, Düsseldorf, Germany), Christine Meschede (Heinrich Heine University Düsseldorf, Düsseldorf, Germany), S. Denise Ruhrberg (Heinrich Heine University Düsseldorf, Düsseldorf, Germany), Adrian Schmidt (Heinrich Heine University Düsseldorf, Düsseldorf, Germany) and Tobias Siebenlist (Heinrich Heine University Düsseldorf, Düsseldorf, Germany)
Copyright: © 2018 |Pages: 13
DOI: 10.4018/IJIRR.2018070103

Abstract

This article analyses the effect of anaphora resolution on information retrieval performance for systems with relevance ranking. It will be investigated if the Mean Average Precision of a retrieval system is improved after an intellectual replacement of all anaphors in a corpus with various texts. These texts mostly consist of news stories and fairy tales, thus covering two varying genres with different amounts of anaphors. A model retrieval system is developed using Lucene to measure the effects of anaphora resolution. Different queries are used and the rankings are analysed in order to show the changes induced by the anaphora resolution. In addition, approaches of automated anaphora resolution are considered. It turns out that the Mean Average Precision improves noticeably by 36% after the anaphora resolution. Thus, it is highly recommended to improve existing approaches of automated anaphora resolution in the future as current attempts do not yet yield satisfying results.
Article Preview

Introduction

This article investigates the impact of anaphora resolution on retrieval systems. To investigate potential benefits, a small sample retrieval system has been built. This system is based on Lucene, providing a search over two nearly identical corpora (in this case, one corpus has undergone anaphora resolution) and two result lists for comparison. As a novelty, the Mean Average Precision (MAP) is calculated for various search queries for both corpora. A comparison of results of automatically and intellectually resolved corpora will be provided. Additionally, a simple Python tool has been developed based on Mitkov’s anaphora resolution system (MARS) to test different approaches, thus highlighting the main problems of automated anaphora resolution.

For the human mind, it is obvious who is meant by she in sentences like “Red riding hood went into the forest. She was scared”. But for a computer or an algorithm, it is not as easy to determine which nominal phrase this pronoun is referring to. This phenomenon is called anaphora, often described as “an occurrence of an expression [which] has its referent supplied by an occurrence of some other expression in the same or another sentence” (King & Lewis, 2016, para. 1). The Oxford Reference Dictionary defines it as “a pronoun or similar element that must be understood in relation to an antecedent” (Matthews, 2007, para. 1).

There are various types of anaphors. The anaphor in the sentence given earlier is called pronominal anaphor. Another example would be: Maria likes Peter. Peter did not believe that. The antecedent here is Maria likes Peter, the anaphoric expression in the second sentence is that. Additionally, the antecedent also does not necessarily have to precede the anaphoric expression (King & Lewis, 2016). If the antecedent follows the anaphor, it is called cataphor. There are multiple other types (like temporal or modal anaphors) that will not be described in detail here. Another exception that needs to be considered is the usage of it as a syntactic expletive, also described as “pleonastic-it” (Lappin & Leass, 1994, p. 538). Examples are expressions like it rains or it is cold outside. They are usually referred to “as the subject of members of a specific set of verbs (seem, appear, etc.), or as the subject of adjectives with clausal complements” (Kennedy & Boguraev, 1996, p. 114).

Anaphora resolution is an ongoing linguistic problem, not yet solved to full satisfaction. In a recent approach, (Cunnings, Patterson, & Felser, 2014) discussed “whether ambiguous pronouns are preferentially resolved via either the variable binding or coreference route”. They conducted experiments in which they monitored the eye-movement of readers, in order to “examine the time-course of pronoun resolution” (Cunnings et al., 2014, p. 43). Their experiments’ “findings were interpreted as being incompatible with theories of pronoun resolution which predict that one particular route to pronoun interpretation should always be favored initially, and in particular are incompatible with the hypothesis that variable binding relations should always be computed before coreference assignment.” (Cunnings et al., 2014, p. 53). In conclusion, their results showed that “the computation of variable binding relations must be facilitated by additional factors such as antecedent recency” (Cunnings et al., 2014, p. 53).

The considerable effort of anaphora resolution only makes sense if it influences the performance of information retrieval or language processing systems. There are studies on the impact of anaphora resolution on the performance of retrieval systems or neighboring approaches as question answering systems or text summarization. Researchers from Syracuse University conducted experiments on anaphora resolution in abstracts of scientific articles (Bonzi, 1991; DuRoss Liddy, 1990; Liddy, Bonzi, Katzer, & Oddy, 1987), Orasan (2007) analyzed anaphora resolution for optimizing text summarization, Vicedo and Ferrández (2000) were able to show the importance of pronominal anaphora resolution in question answering systems, and, finally, Pirkola (1996) proved the relevance of anaphora resolution for searches with proximity operators.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 9: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing