Review of Sentiment Detection: Techniques and Challenges

Review of Sentiment Detection: Techniques and Challenges

Smiley Gupta (N.C. College of Engineering, Israna, India) and Jagtar Singh (N.C. College of Engineering, Israna, India)
Copyright: © 2019 |Pages: 10
DOI: 10.4018/IJDAI.2019010105

Abstract

A large volume of user-generated data is evolving on a day-to-day basis, especially on social media platforms like Twitter, where people express their opinions and emotions regarding certain individuals or entities. This user-generated content becomes very difficult to analyze manually and therefore requires a need for an intelligent automated system which mines the opinions and organizes them using polarity metrics, representing the process of sentiment analysis. The motive of this review is to study the concept of sentiment analysis and discuss the comparative analysis of its techniques along with the challenges in this field to be considered for future enhancement.
Article Preview
Top

Methodology For Sentiment Analysis

Figure 1.

Procedure of sentiment analysis

IJDAI.2019010105.f01
  • 1.

    Data Collection: First and foremost, the user-generated data is collected from social networking sites, forums, and blogging sites. Twitter is one of the most frequently used data sources and the length of text in twitter is maximum 140 characters long. These data are unstructured, expressed in different ways by using the different context of writing along with slangs, acronyms, etc., due to which the manual analysis of text becomes really complex.

  • 2.

    Data Preprocessing: Data preprocessing is nothing but cleaning and filtering out the unstructured data before analysis. In this, identification and elimination of non-textual content and the content that is irrelevant with respect to the following area of study occurs. Cleaning of data involves removal of URL’s, removal of punctuations, case conversion and stemming.

  • 3.

    Feature Selection: Several findings in feature selection specific to sentiment analysis are:

    • Term presence and frequency: Term presence is based individual word or n-grams and Term frequency is the number of repeated occurrences of the term in the text.

    • Parts Of Speech (POS): These features are selected to keep count of the number of verbs, adverbs, and nouns, etc., in the sentence or document.

    • Opinion words and Phrase: These include words and phrases which depict opinions such as ‘good or bad,’ ‘like or hate,’ etc.

    • Negation: The use of negation word in the text can reverse the whole polarity and meaning of opinion. For example: “not good” is the same as “bad.”

  • 4.

    Sentiment Classification Algorithm

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 2 Issues (2021): Forthcoming, Available for Pre-Order
Volume 12: 2 Issues (2020): Forthcoming, Available for Pre-Order
Volume 11: 2 Issues (2019)
Volume 10: 2 Issues (2018)
View Complete Journal Contents Listing