Big Data Contextual Analytics Study on Arabic Tweets Summarization

Big Data Contextual Analytics Study on Arabic Tweets Summarization

Fatimah Al-Ibrahim (King Saud University, Riyadh, Saudi Arabia) and Zakarya A. Alzamil (King Saud University, Riyadh, Saudi Arabia)
Copyright: © 2019 |Pages: 17
DOI: 10.4018/IJKSS.2019100102

Abstract

Twitter represents a source of information as well as a free space for people to express their opinions on diverse topics. The use of twitter is rapidly increasing and generates a massive amount of data from several types and forms, in which searching for relevant tweets in a specific topic is hard manually due to irrelevant tweets. There has been much research on English tweets for understanding context; however, in spite of the fact that the Twitter active Arabic users are over hundreds of millions, there are very limited studies that have investigated Arabic tweets to produce an automatic summarization. This article proposes a multi-conversational Arabic tweets summarization approach, with a new concept of tweet classification based on influence factor. Such an approach is able to analyze Arabic tweets and provide a readable, informative, precise, concise, and diversified summary. The evaluation metrics of precision, recall, and f-measure have shown good results of the system compared to related Arabic summarization studies.
Article Preview
Top

Introduction

Social media, particularly Twitter, represents a source of information as well as a free space for people to express their opinions on diverse topics such as science, social, economic, political, medical etc. The use of microblogging platforms, such as Twitter, has rapidly increased and led to generate a massive amount of data every day from several types and forms. Such type of data is considered as a form of big data because it cannot be perceived, acquired, managed, and processed by traditional database management systems and/or traditional software tools within a tolerable time (Chen et al., 2014). Big data is a term that encompasses different types of complicated and large datasets that are hard to process with the conventional data processing systems (Samuel et al., 2015). In addition, big data has been characterized by three properties, volume, velocity and variety (3V model). The volume is the size of the dataset that should be increasingly big; velocity is the speed of data generation, analysis and delivery that must be, rapidly and timely, conducted; and variety indicates the various types of data from different sources that include unstructured, semi-structured as well as structured data type (Chen et al., 2014; Tsai et al., 2015). Additional property of big data has been added to extend 3V model, in which 4V model has been introduced to include value property that indicates discovering values, e.g., meaningful information, from the dataset (Chen et al., 2014; Tsai et al., 2015).

Understanding big data within certain context is a challenge within the social media, for instance, searching Twitter for tweets based on a certain hashtag criterion may produce huge list of tweets including irrelevant tweets under that desired hashtag as well as some advertisements. This situation suggests that, searching for objectively relevant tweets within the desired hashtag requires understanding the context of such hashtag. Therefore, an automated system is needed to analyze the tweets, extract a set of more-relevant tweets to the desired hashtag, organize them in a form of paragraph based on a set of predefined criteria, and display them as a short and accurate extractive summary within a short time.

There have been much research that has investigated different techniques and approaches to analyze and understand big data that is generated by Twitter and/or other social media platforms, and to extract meaningful and valuable information which can be used for different purpose. For example, there are several studies that concentrate on contextual big data analysis (Vivaldi & Da Cunha, 2013; Zingla, 2015; Belkaroui & Faiz, 2015; El-Fishawy et al., 2014) that uses different classifications approaches and automatic mining techniques to extract a summary from huge collection of articles that were published on the web or collection of written tweets on Twitter with respect to a specific topic. In addition, there are number of studies that concentrate on sentiment analysis techniques (Mane et al., 2014; Bouchlaghem et al., 2016; Chen & Li 2017) that use automatic mining techniques to determine their attitudes with respect to some topics for the purpose of extracting users’ opinions from their published texts, tweets, article, etc. Moreover, different social network analysis techniques have been proposed for other purposes (Ho & Do, 2018; Kiss & Buzás, 2015; Wang et al., 2017).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 12: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 11: 4 Issues (2020): 3 Released, 1 Forthcoming
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing