Leveraging Temporal Markers to Detect Event from Microblogs

Leveraging Temporal Markers to Detect Event from Microblogs

Soumaya Cherichi (University of Tunis, Higher Institute of Management, Tunis, Tunisia) and Rim Faiz (University of Carthage, Institute of Business Studies, Tunis, Tunisia)
Copyright: © 2017 |Pages: 14
DOI: 10.4018/IJKSR.2017070104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

One of the marvels of our time is the unprecedented development and use of technologies that support social interaction. Social mediating technologies have engendered radically new ways of information and communication, particularly during events; in case of natural disaster like earthquakes tsunami and American presidential election. This paper is based on data obtained from Twitter because of its popularity and sheer data volume. This content can be combined and processed to detect events, entities and popular moods to feed various new large-scale data-analysis applications. On the downside, these content items are very noisy and highly informal, making it difficult to extract sense out of the stream. Taking to account all the difficulties, we propose a new event detection approach combining linguistic features and Twitter features. Finally, we present our system that aims (1) detect new events, (2) to recognize temporal markers pattern of an event, (3) and to classify important events according to thematic pertinence, author pertinence and tweet volume.
Article Preview

Introduction

Recent years have revealed an important increase of online social networks and social media platforms, which gave birth to a huge volume of data in blogs and more precisely microblogs. Twitter is an interesting example of the most recent type of social media. Major events and issues are shared and communicated on Twitter before many other online and offline platforms. The amount of content that Twitter now generates has crossed the one billion posts per week mark from around 200 million users, covering topics in politics, entertainment, technology and even natural disasters like earthquakes and tsunamis. During the “Arab Spring Movement,” Twitter was used as an information source to coordinate protests and to bring awareness to the atrocities. In recent world events, social media data has been shown to be effective in detecting earthquakes (Sakaki et al., 2010); rumors; crisis and disaster, spam (Xianghan et al., 2015), and identifying characteristics of information propagation (Ritter et al., 2011, Kunneman, 2016).

A system that can extract this information from Twitter and present an overview of upcoming popular events, such as sport matches, national holidays, and public demonstrations, is of potentially high value. This functionality may not only be relevant for people interested in attending an event or learning about an event; it may also be relevant in situations requiring decision support to activate others to handle upcoming events, possibly with a commercial, safety, or security goal.

Recently, several systems (e.g., (Sakaki et al., 2010)) have been proposed to detect events from tweets, but most of them are missing the analysis component. In the literature, several systems (e.g., (Bansal, 2007; Mei et al., 2016)) are proposed to analyze events from blogs, but they may fail in processing tweets, which are short and noisy, and do not explore rich information (e.g., users’s network) in Twitter. This incites us to study the problem of event detection, which is an interesting and important task in such circumstances.

While most existing work either ignore structured aspects of the information present in Twitter, or transpose traditional approaches of natural language processing NLP to extract these structures (such as parsing), a peculiarity of this work is rather to make the maximum specificities of Twitter (explicit and implicit links between tweets, redundant information, temporal and spatial co-occurrence, metadata, etc.) to rebuild these structures. So, while very many tweets cannot be the subject of parsing, because of their ungrammaticality, a structure linking the events and entities mentioned in the tweet can often be still inferred through the correlation between this particular tweet and others in the same field or a related event.

Making sense of social media content is not trivial. Data streams from social media platforms usually contain much:

  • Informal Use of Language: Twitter users produce and consume information in a very informal manner compared with traditional media. Tweets are 140 characters in length, forcing users to use short forms to convey their message. Many routine words are shortened such as “pls” for “please”, “forgt” for “forgot”, also the use of slang words, abbreviations and compound hashtags;

  • Noisy Information: While traditional event detection approaches assume that all documents are relevant, Twitter data typically contains a vast amount of noise and not every tweet is related to an event.

Complete Article List

Search this Journal:
Reset
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing