A Machine Learning-Based Lexicon Approach for Sentiment Analysis

A Machine Learning-Based Lexicon Approach for Sentiment Analysis

Tirath Prasad Sahu (NIT Raipur, Raipur, India) and Sarang Khandekar (NIT Raipur, Raipur, India)
Copyright: © 2020 |Pages: 15
DOI: 10.4018/IJTHI.2020040102

Abstract

Sentiment analysis can be a very useful aspect for the extraction of useful information from text documents. The main idea for sentiment analysis is how people think for a particular online review, i.e. product reviews, movie reviews, etc. Sentiment analysis is the process where these reviews are classified as positive or negative. The web is enriched with huge amount of reviews which can be analyzed to make it meaningful. This article presents the use of lexicon resources for sentiment analysis of different publicly available reviews. First, the polarity shift of reviews is handled by negations. Intensifiers, punctuation and acronyms are also taken into consideration during the processing phase. Second, words are extracted which have some opinion; these words are then used for computing score. Third, machine learning algorithms are applied and the experimental results show that the proposed model is effective in identifying the sentiments of reviews and opinions.
Article Preview
Top

1. Introduction

Sentiment analysis is becoming a trending and popular amongst researchers. Sentiment analysis is a special case of text classification which aims to categorize opinions based on polarities for e.g. positive or negative. People can express their opinions on the web in the form of reviews. The reviews can be extracted from different fields such as product review, movie review, tweets from Twitter, etc. Basically, sentiment analysis is done to express what a person thinks for a particular product, movie, twitter etc.

In recent years lots of people are expressing their opinions on the web. Everyday millions of people express what they think in the form of reviews on platform such as blogs, forums, social networking sites like twitter etc. These platforms help people to connect to other people whom they don’t even know and get the opinion from them. Basically, these platforms serve as intermediate between the end user and the service provider. From the end users perspective, these platforms are useful to get an idea about the product. From the service provider point of view, it helps to improve the standard of their product and services.

The data on the web is increasing everyday as many people are using it as a platform to express their reviews and opinions. Therefore, it becomes a very hectic task for people to take instant decision about the reviews and opinions. People are confused that on which review they can trust and on which review they cannot. Due to this problem, an automatic system must be designed to make the process of analysis, summarization and classification easy. The general approach for sentiment analysis is the Bag of Word (BOW) Approach (Dave, Lawrence, & Pennock, 2003). In this approach, a document is divided into bag of words to make feature vector which will be used in the classification process. However, BOW approach is failed to generate the desired results as it doesn’t capture word sequence and semantic relation. The works have been carried out in the field of sentiment analysis to improve BOW in combination with linguistic knowledge (Dave, et al., 2003; Gamon, 2004; Kennedy & Inkpen, 2006; Na, Sui, Khoo, Chan, & Zhou, 2004; Ng, Dasgupta, & Arifin, 2006; Pang, Lee, & Vaithyanathan, 2002; Whitelaw, Garg, & Argamon, 2005; Xia, Zong, & Li, 2011). However, they failed to improve the classification accuracy.

The polarity shift is an important issue in sentiment analysis. Many approaches have been suggested in the literature to overcome the polarity shift problem (Councill, McDonald, & Velikovich, 2010; Das & Chen, 2001; Ikeda, Takamura, Ratinov, & Okumura, 2008; Li & Huang, 2009; Li, Lee, Chen, Huang, & Zhou, 2010; Wilson, Wiebe, & Hoffmann, 2009). However, most of them required either complex linguistic knowledge or extra human annotations. Such high-level dependency on external resources makes the system difficult to be widely used in practice.

Sentiment Analysis can be carried out at different level such as aspect level, sentence level and document level. The proposed method uses sentiment analysis at sentence level using the lexical resources SentiWordNet3.0 (Baccianella, Esuli, & Sebastiani, 2010) and Affin111 (Zol & Mulay, 2015) to decide the overall polarity of the reviews based on feature vector and machine learning algorithms.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 16: 4 Issues (2020): 3 Released, 1 Forthcoming
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing