Disambiguation and Filtering Methods in Using Web Knowledge for Coreference Resolution

Disambiguation and Filtering Methods in Using Web Knowledge for Coreference Resolution

Olga Uryupina (University of Trento, Italy), Massimo Poesio (University of Trento, Italy & University of Essex, UK), Claudio Giuliano (University of Trento, Italy & Fondazione Bruno Kessler, Italy) and Kateryna Tymoshenko (University of Trento, Italy & Fondazione Bruno Kessler, Italy)
DOI: 10.4018/978-1-61350-447-5.ch013
OnDemand PDF Download:
List Price: $37.50


The authors investigate two publicly available Web knowledge bases, Wikipedia and Yago, in an attempt to leverage semantic information and increase the performance level of a state-of-the-art coreference resolution engine. They extract semantic compatibility and aliasing information from Wikipedia and Yago, and incorporate it into a coreference resolution system. The authors show that using such knowledge with no disambiguation and filtering does not bring any improvement over the baseline, mirroring the previous findings (Ponzetto & Poesio, 2009). They propose, therefore, a number of solutions to reduce the amount of noise coming from Web resources: using disambiguation tools for Wikipedia, pruning Yago to eliminate the most generic categories and imposing additional constraints on affected mentions. The evaluation experiments on the ACE-02 corpus show that the knowledge, extracted from Wikipedia and Yago, improves the system’s performance by 2-3 percentage points.
Chapter Preview

Coreference is a complex phenomenon and therefore a robust and reliable approach to the problem should address numerous linguistic and common-sense aspects of the task. Previous studies have investigated possibilities for extracting such knowledge from WordNet (Harabagiu, Bunescu, & Maiorano, 2001, Huang, G., W., & A., 2009), Wikipedia (Ponzetto & Strube, 2006) or large text corpora (Haghighi & Klein, 2009, Bean & Riloff, 2004, Garera & Yarowsky, 2006, Yang & Su, 2007).

In the present study, we investigate possibilities of integrating information extracted from Wikipedia and Yago into a coreference resolution system. It has been shown that, even though at earlier stages web knowledge bases might be a source of valuable information (Ponzetto & Strube, 2006), the expansion of such resources inevitably leads to an increase in the amount of noise, making them hardly usable for our application (Ponzetto & Poesio, 2009).

The research line on Wikipedia related to our work is the automatic annotation of terms in a plain text with links to Wikipedia pages. In fact, it is a WSD task because its goal is to link a term in a sentence to the Wikipedia concept that best expresses its sense. Some well-known approaches to this task include (Csomai & Mihalcea, 2008) and (Milne & Witten, 2008). They perform the so-called wikification of the document, that is they first identify the main concepts in a text and then annotate them with links to Wikipedia pages.

Complete Chapter List

Search this Book: