Geo-Tagging News Stories Using Contextual Modelling

Geo-Tagging News Stories Using Contextual Modelling

Md Sadek Ferdous (University of Southampton, Southampton, United Kingdom), Soumyadeb Chowdhury (Singapore Institute of Technology, Singapore, Singapore) and Joemon M. Jose (University of Glasgow, Scotland, United Kingdom)
Copyright: © 2017 |Pages: 22
DOI: 10.4018/IJIRR.2017100104
OnDemand PDF Download:
$37.50

Abstract

With the ever-increasing popularity of Location-based Services, geo-tagging a document - the process of identifying geographic locations (toponyms) in the document - has gained much attention in recent years. There have been several approaches proposed in this regard and some of them have reported to achieve higher level of accuracy. The existing approaches perform well at the city or country level, unfortunately, the performance degrades during geo-tagging at the street/locality level for a specific city. Moreover, these geo-tagging approaches fail completely in the absence of a place mentioned in a document. In this paper, an algorithm is presented to address these two limitations by introducing a model of contexts with respect to a news story. The algorithm evolves around the idea that a news story can be geo-tagged not only using place(s) found in the news, but also using certain aspects of its context. An implementation of the proposed approach is presented and its performance is evaluated on a unique data set where findings suggest an improvement over existing approaches.
Article Preview

Introduction

With the ever-increasing popularity of Location-based Services, geo-tagging a document - the process of identifying geographic locations (toponyms) in the document - has gained much attention in recent years. In such services, geographic locations act as the glue that bind together disparate document sets (such as textual contents, images and videos) from multiple data sources. Devices that produce multimedia documents such as images and videos are equipped with the capability to have additional sensors (GPS sensors) that can geo-tag the related document with geographic information such as latitude and longitude and the respective information is stored in a metadata along with the corresponding document. Web services that accumulate such documents (e.g. YouTube and Flickr) can retrieve such information automatically. In addition, such services allow any user to manually tag any multimedia document with geographic locations in cases the documents are not geo-tagged by their capturing devices. Unfortunately, the geo-tagging procedure is rather cumbersome for textual documents and generally relies on manual human input. There have been several works to address this limitation and some of them have reported to achieve high level of accuracy as reported in (Ding, 2000), (Amitay, 2004), (Garbin, 2005), (Lieberman, 2007), (Andogah, 2012) and (Ignazio, 2014).

As part of a large-scale project, we have been collecting news stories about a country from the country-specific RSS feed of different online news websites on a daily basis for around a year. The main idea is to aggregate this data set with other modes of public data such as social media posts from Twitter; multimedia data from image sharing websites such as Flickr and data from wearable sensors such as lifeloggers and GPS trackers to create a unique multi-modal (textual as well as multimedia) set of data about a particular geographic location. This will encode experiences from multiple user perspectives and has enormous potential in exploiting for public benefit. One of the core challenges for dealing with such heterogeneous set of data is to define the parameters that can be used to link them together for different use-case scenarios. Among several parameters, the spatio-temporal attribute pair is the simplest of choices due to their omni-presence in all our data sets except in news stories.

News stories, mostly textual, are equipped with a temporal attribute (in the form of a timestamp) to highlight the time and date of publication, however, lack any accompanying metadata to publicise the spatial attribute, even though every news generally has a geographic focus in it (Andogah, 2012). The lack of any spatial attribute makes it a challenging task to geo-tag a news story in an automatic fashion. To geo-tag our collection of news stories, we have been looking for publicly available geo-tagging APIs. CLAVIN (CLAVIN, 2016) and CLIFF (Ignazio, 2014) and (CLIFF, 2015) are two such APIs.

After utilising CLAVIN and CLIFF over a subset of our news data set, we have noticed the following shortcomings:

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing