Using Event Semantics for Toponym Disambiguation

Using Event Semantics for Toponym Disambiguation

Kirk Roberts, Cosmin Adrian Bejan, Sanda Harabagiu
DOI: 10.4018/978-1-60960-741-8.ch030
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This chapter discusses a method for improving the disambiguation of location names using limited event semantics. Location names are often ambiguous, as the same name may refer to locations in different states, countries, or continents. Ambiguous location names, also known as toponyms, need to be disambiguated (or grounded) when resolving many spatial relations expressed in textual documents. Previous methods for disambiguating toponyms have utilized simple heuristics, statistical ranking, and ontological methods in order to resolve a location reference. However, since toponyms are used in documents that refer to events, semantic knowledge characterizing events can be used to ground location names. We propose an ontology-based method with a technique that considers the participants in events such as people, organizations, and other locations. Event semantics are integrated into an ontology that is used to distinguish geographical names through a probabilistic approach based on logistic regression. Our experimental results on the SpatialML corpus (Mani et al., 2008) indicate that using event structures improves the quality of disambiguated toponyms.
Chapter Preview
Top

Introduction

Toponym disambiguation is the task of grounding ambiguous spatial locations in text (toponyms) by normalizing them to some structured representation (e.g., geo-coordinates, database entry, or location within a geographic ontology). This task proves to be quite difficult for some highly ambiguous locations. For example, there are over one thousand places named Springfield. To be able to assess the level of location name ambiguity, we have performed a two-tiered study. In the study, we used two gazetteers to count the number of locations with ambiguous names:

  • Geographic Names Information System (GNIS)1 provided by the U.S. Geological Survey (USGS), which contains over two million locations and facilities within the United States.

  • GEOnet Names Server (GNS)2 provided by the U.S. National Geospatial-Intelligence Agency (NGA), which contains over six million locations world-wide, excluding the United States.

We first determined that of more than two million unique location names in the two gazetteers, less than 20% are ambiguous. Then, to test the ambiguity of spatial locations in common use, we analyzed the SpatialML corpus (Mani et al., 2008), which contains 428 documents and 715 unique, manually-annotated location names. For these names, more than 80% are ambiguous. For example, our database contains 44 matches for “America,” including América/Mexico, America/Guinea-Bissau, and the United States of America; 24 matches for “Palestine,” including Palestine/Texas, and the Occupied Palestinian Territory; and 7 matches for “Baghdad,” including both the Iraqi city of Baghdad and the Iraqi Governate of Baghdad that contains the city. For more details on this case study, see Table 1.

Table 1.
Case study on ambiguous names. (a) Globally ambiguous names collected using USGS and NGA gazetteers. (b) Ambiguous names in corpus collected on 715 unique names in 428 documents from SpatialML (Mani et al., 2008).
DuplicatesEntriesPercent
(a)
12,150,85580.2%
2+531,55019.8%
5+86,4933.2%
10+30,7591.1%
50+2,2940.086%
100+6170.023%
1000+50.0002%
(b)
111916.6%
2+59683.4%
5+43861.3%
10+31043.4%
50+8311.6%
100+162.2%

Complete Chapter List

Search this Book:
Reset