A Study of Big Data Processing for Sentiments Analysis

A Study of Big Data Processing for Sentiments Analysis

Dinesh Chander, Hari Singh, Abhinav Kirti Gupta
Copyright: © 2021 |Pages: 38
DOI: 10.4018/978-1-7998-3444-1.ch001
(Individual Chapters)
No Current Special Offers


Data processing has become an important field in today's big data-dominated world. The data has been generating at a tremendous pace from different sources. There has been a change in the nature of data from batch-data to streaming-data, and consequently, data processing methodologies have also changed. Traditional SQL is no longer capable of dealing with this big data. This chapter describes the nature of data and various tools, techniques, and technologies to handle this big data. The chapter also describes the need of shifting big data on to cloud and the challenges in big data processing in the cloud, the migration from data processing to data analytics, tools used in data analytics, and the issues and challenges in data processing and analytics. Then the chapter touches an important application area of streaming data, sentiment analysis, and tries to explore it through some test case demonstrations and results.
Chapter Preview

Data Processing

Since last decade, rapid development of Internet enabled services such as social media, Internet of Things, and cloud based services have led to tremendous growth of data termed as big data. This data has become very difficult to be handled and managed for further processing (Jin et al., 2015). It has been estimated that around 2.5 quintillion bytes of new data is generated per day and expected to be more in near future as the number of internet users are growing unprecedentedly. This exponential growth of data has posed many challenges in front of researchers, academia and Industry across the globe. Moreover, the big data is unstructured: it varies in volume, velocity, veracity and variety makes (4Vs) it more challenging to manage and process (Mishra, R. K., & Mishra, R. K., 2018). This sudden explosion of data in terabytes, petabytes and exabytes could not be handled by the traditional database such as SQL led to the emergence of new tools and techniques to process the big data (Storey, V. C., & Song, I. Y., 2017).

Figure 1.

Big data chain


Big data processing and analysis have become very crucial for better decision making, knowledge discovery, business intelligence and actionable insights. The Fig-1 represents the big data chain i.e. from data collection to decision making (Janssen, M., van der Voort, H., & Wahyudi, A., 2017). Big data is collected in raw form from various sources of interest which need to be prepared for processing. Next the quality data sets are prepared for further processing using data cleansing and standardization. After that, data processing takes place which includes transformation, aggregation and pattern generation. Once the data processing is completed, various reports are generated and analyzed for better decision making, knowledge discovery and insight or trends. Analysis of data could be classified as descriptive, diagnostic, predictive and prescriptive (Perwej, Y., 2017).

This book chapter proposes to show various tools, techniques, and technologies of data processing and analytics. Later, the use streaming data for sentiment analysis through executable test cases is presented. Sentiment analysis is performed on run-time tweets with Python using twitter API “tweepy” and obtained results are presented through plots.

A survey on various sentiment analysis methods used by researchers is also presented. This would also help in identifying the best one and possibly may be in predicting a newer one.


Failure Of Traditionalsql In Handling Big Data

The volume of data is expected to grow 50% per year, and data production by 2020 will be 50 times larger than what it was in 2009. This rapid increase in volume requires powerful tools and techniques to process big data (Yaqoob, I., Hashem, I. A. T., Gani, A., Mokhtar, S., Ahmed, E., Anuar, N. B., & Vasilakos, A. V., 2016). The conventional tools such as SQL are unable to process it due to high volume, velocity and veracity of data. With such a diversification of data, ACID properties (Atomicity, Consistency, Integrity, and Durability) of databases are very difficult to meet using conventional tools; also desired outcome is difficult to produce within a reasonable frame of time period.

Secondly, most of the data are being generated in semi-structured or unstructured format in the form of images, text, audio, video and mails. Traditional tools are mainly designed to deal with structured data only. Therefore, new and advanced technologies have been devised to cope up the processing of big data in batches. In the next section, Hadoop based technologies to handle this increasing amount data has been discussed.

Complete Chapter List

Search this Book: