Multilingual Analysis of Twitter News in Support of Mass Emergency Events

Multilingual Analysis of Twitter News in Support of Mass Emergency Events

Ulrich Bügel (Fraunhofer IOSB, Karlsruhe, Germany) and Andrea Zielinski (Information Retrieval, Knowledge Management and Text Mining, Fraunhofer IOSB, Karlsruhe, Germany)
DOI: 10.4018/jiscrm.2013010105
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Social media are increasingly becoming a source for event-based early warning systems in the sense that they can help to detect natural disasters and support crisis management during or after disasters. In this article the authors study the problems of analyzing multilingual twitter feeds for emergency events. Specifically, they consider tsunami and earthquakes as one possible originating cause of tsunami. Twitter messages provide testified information and help to obtain a better picture of the actual situation. Generally, local civil protection authorities and the population are likely to respond in their native language. Therefore, the present work focuses on English as “lingua franca” and on under-resourced Mediterranean languages in endangered zones, particularly Turkey, Greece, and Romania. The authors investigated ten earthquake events and defined four language-specific classifiers that can be used to detect earthquakes by filtering out irrelevant messages that do not relate to the event. The final goal is to extend this work to more Mediterranean languages and to classify and extract relevant information from tweets, translating the main keywords into English. Preliminary results indicate that such a filter has the potential to confirm forecast parameters of tsunami affecting coastal areas where no tide gauges exist and could be integrated into seismographic sensor networks.
Article Preview

Introduction

Emergency information processing of social media can contribute effectively to identify regions affected by natural hazards such as earthquakes or tsunami, given that the feeds are real-time and often contain location information (ca. 1.2% with exact coordinates; ca. 50% city or state derived from the user profile). Due to the massive growth of Twitter data and its increasing number of users, it is however, a challenge to access and interpret the stream of data efficiently. Within the last years, there have been major achievements to make use of such “weak” human sensors as a complement to seismic sensors in some early warning systems (see Sakaki, 2010, Guy, 2010), focusing on English and Japanese. At present, there is no similar alerting system for the Mediterranean region. We try to fill this gap within the European TRIDEC project (www.tridec-online.eu) by adapting state of the art algorithms to the common Twitter languages in the endangered zones.

Social media often play a crucial role in disaster management during and after the crisis: citizens generally use Twitter postings or SMS messages to report emergencies. In this case, the information contained in them might be relevant for crisis management (relief and medical care for those affected, repair of broken infrastructure, etc.), so that there is a strong need to classify, cluster and extract such information effectively from large-scale noisy and unstructured data. As the messages are very short (max. 140 characters), NLP analysis is particularly difficult.

A number of text mining tools have been applied to recognize tactical, actionable information in tweets (Verma, 2011), to find messages that contain real-world or real-event information (Becker, 2011; Naaman, 2011), or to extract Named Entities (Neubig, 2011) or other news content (Sankaranarayanan, 2009) for one single language (mostly Japanese or English).

In some cases, though, it is crucial to cross language boundaries. For instance, when the epicenter is near the border of a country (e.g., Western Turkey and Greece), or when a twitter user reports an event in his/her native language (e.g., Romanian) that needs to be translated into a different language (e.g. English, German, or Spanish).

Therefore, our long-term goal within TRIDEC is to support the access to relevant information across languages, focusing on the translation of under-resourced Mediterranean languages like Turkish/Greek/Romanian into English.

The multilingual nature of the blogosphere has been a major hindrance during the Haitian earthquake, where reports ranged from Japanese, to English and Spanish. Caragea (2011)’s work is one of the few that deals with multilinguality, classifying either English or Spanish messages into one of 10 emergency classes.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 9: 4 Issues (2017): 1 Released, 3 Forthcoming
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing