Article Preview
TopIntroduction
This article investigates the impact of anaphora resolution on retrieval systems. To investigate potential benefits, a small sample retrieval system has been built. This system is based on Lucene, providing a search over two nearly identical corpora (in this case, one corpus has undergone anaphora resolution) and two result lists for comparison. As a novelty, the Mean Average Precision (MAP) is calculated for various search queries for both corpora. A comparison of results of automatically and intellectually resolved corpora will be provided. Additionally, a simple Python tool has been developed based on Mitkov’s anaphora resolution system (MARS) to test different approaches, thus highlighting the main problems of automated anaphora resolution.
For the human mind, it is obvious who is meant by she in sentences like “Red riding hood went into the forest. She was scared”. But for a computer or an algorithm, it is not as easy to determine which nominal phrase this pronoun is referring to. This phenomenon is called anaphora, often described as “an occurrence of an expression [which] has its referent supplied by an occurrence of some other expression in the same or another sentence” (King & Lewis, 2016, para. 1). The Oxford Reference Dictionary defines it as “a pronoun or similar element that must be understood in relation to an antecedent” (Matthews, 2007, para. 1).
There are various types of anaphors. The anaphor in the sentence given earlier is called pronominal anaphor. Another example would be: Maria likes Peter. Peter did not believe that. The antecedent here is Maria likes Peter, the anaphoric expression in the second sentence is that. Additionally, the antecedent also does not necessarily have to precede the anaphoric expression (King & Lewis, 2016). If the antecedent follows the anaphor, it is called cataphor. There are multiple other types (like temporal or modal anaphors) that will not be described in detail here. Another exception that needs to be considered is the usage of it as a syntactic expletive, also described as “pleonastic-it” (Lappin & Leass, 1994, p. 538). Examples are expressions like it rains or it is cold outside. They are usually referred to “as the subject of members of a specific set of verbs (seem, appear, etc.), or as the subject of adjectives with clausal complements” (Kennedy & Boguraev, 1996, p. 114).
Anaphora resolution is an ongoing linguistic problem, not yet solved to full satisfaction. In a recent approach, (Cunnings, Patterson, & Felser, 2014) discussed “whether ambiguous pronouns are preferentially resolved via either the variable binding or coreference route”. They conducted experiments in which they monitored the eye-movement of readers, in order to “examine the time-course of pronoun resolution” (Cunnings et al., 2014, p. 43). Their experiments’ “findings were interpreted as being incompatible with theories of pronoun resolution which predict that one particular route to pronoun interpretation should always be favored initially, and in particular are incompatible with the hypothesis that variable binding relations should always be computed before coreference assignment.” (Cunnings et al., 2014, p. 53). In conclusion, their results showed that “the computation of variable binding relations must be facilitated by additional factors such as antecedent recency” (Cunnings et al., 2014, p. 53).
The considerable effort of anaphora resolution only makes sense if it influences the performance of information retrieval or language processing systems. There are studies on the impact of anaphora resolution on the performance of retrieval systems or neighboring approaches as question answering systems or text summarization. Researchers from Syracuse University conducted experiments on anaphora resolution in abstracts of scientific articles (Bonzi, 1991; DuRoss Liddy, 1990; Liddy, Bonzi, Katzer, & Oddy, 1987), Orasan (2007) analyzed anaphora resolution for optimizing text summarization, Vicedo and Ferrández (2000) were able to show the importance of pronominal anaphora resolution in question answering systems, and, finally, Pirkola (1996) proved the relevance of anaphora resolution for searches with proximity operators.