Article Preview
TopIntroduction
Social media, particularly Twitter, represents a source of information as well as a free space for people to express their opinions on diverse topics such as science, social, economic, political, medical etc. The use of microblogging platforms, such as Twitter, has rapidly increased and led to generate a massive amount of data every day from several types and forms. Such type of data is considered as a form of big data because it cannot be perceived, acquired, managed, and processed by traditional database management systems and/or traditional software tools within a tolerable time (Chen et al., 2014). Big data is a term that encompasses different types of complicated and large datasets that are hard to process with the conventional data processing systems (Samuel et al., 2015). In addition, big data has been characterized by three properties, volume, velocity and variety (3V model). The volume is the size of the dataset that should be increasingly big; velocity is the speed of data generation, analysis and delivery that must be, rapidly and timely, conducted; and variety indicates the various types of data from different sources that include unstructured, semi-structured as well as structured data type (Chen et al., 2014; Tsai et al., 2015). Additional property of big data has been added to extend 3V model, in which 4V model has been introduced to include value property that indicates discovering values, e.g., meaningful information, from the dataset (Chen et al., 2014; Tsai et al., 2015).
Understanding big data within certain context is a challenge within the social media, for instance, searching Twitter for tweets based on a certain hashtag criterion may produce huge list of tweets including irrelevant tweets under that desired hashtag as well as some advertisements. This situation suggests that, searching for objectively relevant tweets within the desired hashtag requires understanding the context of such hashtag. Therefore, an automated system is needed to analyze the tweets, extract a set of more-relevant tweets to the desired hashtag, organize them in a form of paragraph based on a set of predefined criteria, and display them as a short and accurate extractive summary within a short time.
There have been much research that has investigated different techniques and approaches to analyze and understand big data that is generated by Twitter and/or other social media platforms, and to extract meaningful and valuable information which can be used for different purpose. For example, there are several studies that concentrate on contextual big data analysis (Vivaldi & Da Cunha, 2013; Zingla, 2015; Belkaroui & Faiz, 2015; El-Fishawy et al., 2014) that uses different classifications approaches and automatic mining techniques to extract a summary from huge collection of articles that were published on the web or collection of written tweets on Twitter with respect to a specific topic. In addition, there are number of studies that concentrate on sentiment analysis techniques (Mane et al., 2014; Bouchlaghem et al., 2016; Chen & Li 2017) that use automatic mining techniques to determine their attitudes with respect to some topics for the purpose of extracting users’ opinions from their published texts, tweets, article, etc. Moreover, different social network analysis techniques have been proposed for other purposes (Ho & Do, 2018; Kiss & Buzás, 2015; Wang et al., 2017).