Geocoding Tweets Approach Based on Conceptual Representations in the Context of the Knowledge Society

Geocoding Tweets Approach Based on Conceptual Representations in the Context of the Knowledge Society

Imelda Escamilla (CIC, Instituto Politécnico Nacional, Mexico City, Mexico), Miguel Torres-Ruiz (CIC, Instituto Politécnico Nacional, Mexico City, Mexico), Marco Moreno-Ibarra (CIC, Instituto Politécnico Nacional, Mexico City, Mexico), Rolando Quintero (CIC, Instituto Politécnico Nacional, Mexico City, Mexico), Giovanni Guzmán (CIC, Instituto Politécnico Nacional, Mexico City, Mexico) and Vladimir Luna-Soto (CIC, Instituto Politécnico Nacional, Mexico City, Mexico)
Copyright: © 2016 |Pages: 18
DOI: 10.4018/IJSWIS.2016010103
OnDemand PDF Download:
No Current Special Offers


In this paper, an approach to geocode tweets published in Spanish is proposed. The tweets are related to traffic events within an urban context of the Mexico City. They are generated by a particular phenomenon for knowing the behavior of the involved geographic entities. In order to disambiguate and verify the consistency of information, an application ontology was defined. Thus, the core goal is to identify location as well as spatial relationships between entities presented in the events, using semantic and spatial analysis of the collected dataset. In consequence, a visualization method for presenting the results was also proposed. The paper describes the methodology for enabling the discovery of spatial patterns within traffic tweets and provides useful information to make timely decisions and contribute in the context of Knowledge Society.
Article Preview

1. Introduction

The current society has knowledge as one of its most important values and indeed this is often called Knowledge Society. The application of advanced software technologies in the context of the Knowledge Society is a bold contribution of the software engineering scientific community and a joint vision for applied humanistic computing.

In the last years, Twitter has become a popular micro blogging platform, with over 400 million tweets posted daily according to tweespeed.com1, making it a tool that can help significantly in the knowledge society due to its agile reading (no more than 140 characters), dynamic (information available in real time), accessible (for almost any device connected to Internet), functional (allows you to embed pictures, videos and links to other content), organized (with hashtags that represent subjects and ordered by date of publication), interactive (can view posts from other people, follow them, respond, share your posts by retweet or save them to mark them as favorites), non-invasive (no chat Instant Messaging) and with the possibility of anonymity (using nicknames or impersonal nicknames) (Pérez et al., 2012; Kassens, 2012; Welch & Bonnan, 2012).

This has led many research efforts on various topics to exploit this information such as event detection (Agarwal et al., 2012; Atefeh & Khreich, 2015), health monitoring (Nielsen et al., 2015), emergency detection (Seol et al., 2013), and among others. Many of these applications can benefit from information about the location, where the events occur, but unfortunately, this information is very poor, because only 1% of tweets contain geo-tags (Takhteyev et al., 2012).

The extraction of information from tweets presents some challenges, i.e., information is completely unstructured and its limited to 140 characters, tweets can contain grammatical errors, and abbreviations and each user has its own writing style, so information can be incomplete, false or not credible (Ritter, 2012).

However, Gutierrez et al. (2015) and Oussalah et al. (2013) established that the use of information content in tweets, provides geographic information, because the texts commonly refers to further locations. The tweet analysis allows us to know and evaluate social and natural events. Nevertheless, geocoding methods are used to translate geographic locations represented in the text (e.g. detection and location of events in a geographic area). They have focused on point feature type (Quin et al., 2013; Hart & Zandbergen, 2013; Krumm & Horvitz, 2015) and there are not approaches oriented towards polygon representation.

Thus, in this paper, a methodology focused on geocoding events appearing in tweets about traffic events of the Mexico City is proposed. The work consists of identifying events, geographic features and their spatial relationships, supported by conceptual representations, Natural Language Processing techniques and classification algorithms. Unlike other works, the detection of trending topics is not supported in this paper.

This paper is organized as follows: Section 2 describes the related work in the fields of geocoding based on short texts. The proposed methodology is presented in Section 3. In Section 4 the evaluation and comparison of the proposed method with other approaches are presented. Section 5 outlines the conclusions and future work.

Complete Article List

Search this Journal:
Volume 18: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing