Article Preview
Top1. Introduction
Sentiment Analysis, otherwise called as opinion-mining applies natural language processing techniques to schematically recognize, extract, enumerate, and study the subjective information. It is extensively applied to reviews given by the customers; responses to the surveys; reviews in online and social media; reviews given over products in online e-stores, to create AI based bots or assistants. Sentiment analysis classifies the opinion as positive or negative. Lexicon-based and Machine-learning (ML) based approaches are applied to identify the sentiment of any sentence. The former approach uses a vocabulary which contains pre-defined negative and positive words and the latter approach uses training and testing data to identify the positive and negative words. Sentiment analysis can be applied to classify emotions based on subjective parameters (Liu, 2010). It is known as emotion AI and has a variety of purposes in different fields like analyzing sentiments in emails, comments and survey feedback. It plays an imperative role in the domain of Artificial Intelligence (Mäntylä et al, 2018; Poria et al, 2018).
The textual datasets that are applied for sentiment analysis are first subjected to preprocessing. Most of the datasets require removal or fixing of missing values, null values or redundant values. Data pre-processing step includes sampling, cleaning and transformation of data. The type of data pre-processing needed by a particular dataset depends on the type of datasets (textual/image/numerical dataset). In the proposed approach, the type of dataset is a textual dataset.
Movies are one of the finest forms of entertainment and it’s a very common thing that the people watch movies and share their opinions on the social media platforms. By analyzing the reviews on the movies, the positive and negative opinion over the movie can be found. Thus, sentiment analysis can help in knowing the public opinion of that movie. Twitter, another platform where a huge perception of the user’s opinion is posted every day and these opinions can be over any generic content. Few of the recent research articles focus over detecting the hatred words in tweets. A number of emotional labels is used largely in tweets and is given in Figure 1.
Figure 1.
Labels used to classify the sentiments of the comments
The section split of this paper is given here: Section 2 details the terminologies, tasks, levels and open challenges in sentiment analysis; Section 3 does a detailed analysis about the literature work done in this field; Section 4 explains the step by step procedure of implementing review analysis using SVM; Section 5 tabulates all the experimental outcomes and compares with the results of existing works; Section 6 concludes the research work.
Top2. Terminologies
- •
Natural language processing (NLP): It is applied in sentiment analysis to review the marketing strategies and has reshaped the business approach. The steps of applying NLP (Chowdhury, 2003) in analyzing a review includes the process of tokenization; applying Part Of Speech (POS); text lemmatization; stop word identification, etc
- •
Tokenization: Tokenization (Webster & Kit, 1992) is splitting a phrase or sentence or paragraph, or an entire text document into smaller units or terms. Each of these smaller units are called tokens. Tokenization is important because the meaning of the text could easily be interpreted by analyzing the words present in the text. Tokenization is a critical step in NLP and jumping into the model-building is not possible without applying tokenization (Pentheroudakis et al, 2006).
- •
Bag-of-words: It’s a way of representing text data as a group of words. The bag-of-words model is applied in language and document classification (Voorhees, 1999).
A Model for the sentiment analysis is as given in Figure 2.
Figure 2. General model of sentiment analysis