Geocoding Tweets Based on Semantic Web and Ontologies

Geocoding Tweets Based on Semantic Web and Ontologies

Imelda Escamilla (CIC, Instituto Politécnico Nacional, Mexico City, Mexico), Miguel Torres Ruíz (Instituto Politécnico Nacional, Mexico), Marco Moreno Ibarra (Instituto Politécnico Nacional, Mexico), Vladimir Luna Soto (Instituto Politécnico Nacional, Mexico), Rolando Quintero (Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City, Mexico) and Giovanni Guzmán (Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City, Mexico)
DOI: 10.4018/978-1-5225-5042-6.ch014


Human ability to understand approximate references to locations, disambiguated by means of context and reasoning about spatial relationships, is the key to describe spatial environments and to share information about them. In this paper, we propose an approach for geocoding that takes advantage of the spatial relationships contained in the text of tweets, using semantic web, ontologies and spatial analyses. Microblog text has special characteristics (e.g. slang, abbreviations, acronyms, etc.) and thus represents a special variation of natural language. The main objective of this work is to associate spatial relationships found in text with a spatial footprint, to determine the location of the event described in the tweet. The feasibility of the proposal is demonstrated using a corpus of 200,000 tweets posted in Spanish related with traffic events in Mexico City.
Chapter Preview


In recent years the amount of data available on the social web has grown massively. Therefore, researchers have developed approaches that leverage this social web data to tackle interesting challenges of the semantic web. Among these are methods for learning ontologies from social media or crowdsourcing, extracting semantics from data collected by citizen science and participatory sensing initiatives, or for better understanding and describing user activities. The rich data provided by the social web can be used to build the semantic web. This task includes learning basic semantic relationships, e.g., between entities, or by employing more sophisticated methods to construct a complete knowledge graph or ontology. There are additional synergies between the social web and the semantic web. For example, content from the social web could be enriched and linked to the semantic web using named entity recognition and linking, as well as sentiment analysis (Hotho, A., Jäschke, R., & Lerman, K., 2017).

Currently Every second, on average, around 6,000 tweets are tweeted on Twitter which corresponds to over 350,000 tweets sent per minute, 500 million tweets per day and around 200 billion tweets per year according to twitter-statistics (Internet Live Stats, n.d.) making it a tool that can help significantly in the semantic web due to its agile reading (no more than 140 characters), dynamic (information available in real time), accessible (for almost any device connected to Internet), functional (allows you to embed pictures, videos and links to other content), organized (with hashtags that represent subjects and ordered by date of publication), interactive (can view posts from other people, follow them, respond, share your posts by retweet or save them to mark them as favorites), non-invasive (no chat Instant Messaging) and with the possibility of anonymity (using nicknames or impersonal nicknames) (Pérez et al., 2012; Duque et al., 2012; Gómez et al., 2012; Kassens, 2012; Wakefield et al., 2011; Welch & Bonnan, 2012).

This has led many research efforts on various topics to exploit this information such as event detection (Atefeh & Khreich, 2015; Tonon et al., 2017), health monitoring (Nielsen et al., 2015), emergency detection (Seol et al., 2013), and among others. Many of these applications can be benefited with information about the location, where the events occur, but unfortunately, this information is very poor, because only 1% of tweets contain geo-tags (Takhteyev et al., 2012).

The extraction of information from tweets presents some challenges, i.e., information is completely unstructured and its limited to 140 characters, tweets can contain grammatical errors, and abbreviations and each user has its own writing style, so information can be incomplete, false or not credible (Ritter, 2012).

However, Gutierrez et al. (2015) and Oussalah et al. (2013) established that the use of information content in tweets, provides geographic information, because the texts commonly refers to further locations. The tweet analysis allows us to know and evaluate social and natural events. Nevertheless, geocoding methods are used to translate geographic locations represented in the text (e.g. detection and location of events in a geographic area). They have focused on point feature type (Iversen et al., 2014; Hart & Zandbergen, 2013; Krumm & Horvitz, 2015) and there are not approaches oriented towards polygon representation.

Complete Chapter List

Search this Book: