Sentiment Analysis of Tweets Using Naïve Bayes, KNN, and Decision Tree

Sentiment Analysis of Tweets Using Naïve Bayes, KNN, and Decision Tree

Kadda Zerrouki (Higher School of Computer Science May 8, 1945, ESI Sidi Bel Abbes, Algeria), Reda Mohamed Hamou (GeCoDe Labs, University of Saida Dr Moulay Tahar, Algeria) and Abdellatif Rahmoun (Higher School of Computer Science May 8, 1945, ESI Sidi Bel Abbes, Algeria)
Copyright: © 2020 |Pages: 15
DOI: 10.4018/IJOCI.2020100103
OnDemand PDF Download:
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Making use of social media for analyzing the perceptions of the masses over a product, event, or a person has gained momentum in recent times. Out of a wide array of social networks, the authors chose Twitter for their analysis as the opinions expressed there are concise and bear a distinctive polarity. Sentiment analysis is an approach to analyze data and retrieve sentiment that it embodies. The paper elaborately discusses three supervised machine learning algorithms—naïve bayes, k-nearest neighbor (KNN), and decision tree—and compares their overall accuracy, precision, as well as recall values, f-measure, number of tweets correctly classified, number of tweets incorrectly classified, and execution time.
Article Preview
Top

Introduction

Twitter is a popular micro blogging service where users create status messages (called “tweets”). These tweets sometimes express opinions about different topics. We propose a method to automatically extract sentiment (positive or negative) from a tweet.

Sentiment Analysis is the process of finding the opinion of user about some topic or the text in consideration. It is also known as opinion mining. In other words, it determines whether a piece of writing is positive or negative.

Sentiment analysis is a process where the dataset consists of emotions, attitudes or assessment which takes into account the way a human thinks, as noted by Feldman Ronen (Feldman, 2013). In a sentence, trying to understand the positive and the negative aspect is a very difficult task. The features used to classify the sentences should have a very strong adjective in order to summarize the review. These contents are even written in different approaches which are not easily deduced by the users or the firms making it difficult to classify them.

This task has received a lot of interest from the research community in the past years. The work is regarded the manner in which sentiment can be classified from texts pertaining to different genres and distinct languages, in the context of various applications, using knowledge-based, semi-supervised and supervised methods, as noted by Liu Bing (Liu, 2011). The result of the analyses performed have shown that the different types of text require specialized methods for sentiment analysis, as, for example, the sentiments are not conveyed in the same manner in newspaper articles and in blogs, reviews, forums or other types of user-generated contents, as noted by Balahur Alexandra and al (Balahur, Steinberger, Kabadjov, Zavarella, Van Der Goot, Halkia & Belyaeva, J. 2013).

The Sentiment found within comments, feedback or critiques provide useful indicators for many different purposes and can be categorized by polarity, as noted by Kalaivani and Shunmuganathan (Kalaivani & Shunmuganathan, 2013). By polarity we tend to find out if a review is overall a positive one or a negative one. For example:

  • Positive Sentiment in Subjective Sentence: “I loved the movie Mary Kom”: This sentence is expressed positive sentiment about the movie Mary Kom and we can decide that from the sentiment threshold value of word “loved”. So, the threshold value of the word ‘loved’ has positive numerical threshold value;

  • Negative Sentiment in Subjective Sentences: “Phata poster nikla hero is a flop movie” defined sentence is expressed negative sentiment about the movie named: “Phata poster nikla hero” and we can decide that from the sentiment threshold value of a word: “flop”. So, the threshold value of a word: “flop” has negative numerical threshold value;

  • Sentiment Analysis is of Three Different Types: Document level, Sentence level and Entity level (Kiritchenko, Zhu, & Mohammad, 2014).

The difficulties in Sentiment Analysis are an opinion word which is treated as positive side may be considered as negative in another situation. Also the degree of positivity or negativity also has a great impact on the opinions. For example: “good” and “very good” cannot be treated same. Although the traditional text processing says that a small change in two pieces of text does not change the meaning of the sentences (Kalaivani & Shunmuganathan, 2013). However the latest text mining gives room for advanced analysis, measuring the intensity of the word. Here is the point where we can scale the accuracy and efficiency of different algorithms (Fan, Wallace, Rich, & Zhang, 2006).

In this paper for Sentiment Analysis we are using three Supervised Machine Learning algorithms: Naïve Bayes, K-Nearest Neighbor (KNN) and Decision Tree to calculate the accuracy, precisions (of positive and negative corpuses) and recall values (of positive and negative corpuses), F-Measure, Number of tweets correctly classified, Number of tweets incorrectly classified and Execution Time.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 12: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing