A Hybridized GA-Based Feature Selection for Text Sentiment Analysis

A Hybridized GA-Based Feature Selection for Text Sentiment Analysis

Gyananjaya Tripathy, Aakanksha Sharaff
Copyright: © 2023 |Pages: 13
DOI: 10.4018/978-1-7998-9220-5.ch112
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Recent research work has described the effectiveness of various sentiment classification techniques ranging from simple lexicon-based methods to more complex machine learning techniques. Researchers of the article develop an integrated framework that bridges the gap between dictionary-based methods and machine learning methods to achieve better accuracy and more flexibility. To solve the problem of scalability that occurs as the feature set grows, a hybrid genetic algorithm (GA)-based dimensional reduction method is proposed. With the help of this novel approach, authors can reduce the size of the feature set by reaching a remarkable value of accuracy. Here the authors have compared the proposed feature reduction method with a widely used principal component analysis and singular value decomposition-based feature reduction algorithms. In addition, the proposed sentiment analysis model is tested in other metrics, including precision, recall, F1 score, and feature size.
Chapter Preview
Top

Introduction

The advancement of today's internet technology has changed the lifestyle of society. Due to this advancement, the current generation has upgraded their lifestyle up to a certain extent. Different social forums are commonly used to share helpful information and new ideas for advertisement and service improvement. The social platform is often watched with various perspectives. These include compiling business marketing strategies for product and promotional services, observing harmful actions to detect and reduce cyber-attacks, and sentiment analysis to analyze human responses and feedback (Saberi & Saad, 2017). Sentiment analysis is often referred to as archaeology, uprooting and classifying sentiments from text using Natural Language Processing (NLP), mathematics, or Machine Learning (ML) methods. ML methods use various approaches and a database that can be trained to distinguish and find sentiments (Fiok et al., 2021). Authors have widely studied the field of sentiment analysis over the past few years. In this state of affairs, different methods have been tested after development. The most usual process is ML which requires a robust database to train and learn the relationship between various aspects and sentiments.

Sentiment analysis is a form of written assessment or language ​​spoken to determine whether speaking is negative, positive, or neutral and to what extent. Current analysis Market tools can handle a lot of price customer criticism honestly and accurately. Collectively, sentiment analysis finds customers’ ideas on various topics, including procurement, the provision of services, or the presentation of promotions (Alsaeedi & Khan, 2019). Sentiment analysis is often used in the case of a review. Reviews can be taken from various resources for various reasons, such as product reviews, political reviews, and community reviews. When feedback from customers using any product, further questions will be included: Is the product usable? Is this product satisfactory? Is this product worth the money? Some helpful information always comes out of updates in positive or negative feedback (Birjali et al., 2021). Sentiments need to be learned using these practical answers. The semantic position estimates submission and ideas in the text data. The rules-based analysis searches for different words in a text and categorizes them based on positivity and negativity.

The proposed paper is based on Amazon's review dataset's hybrid sentiment analysis process. The dataset contains several responses and equally separates the positive and negative labels. Authors have developed an integrated novel algorithm based on the Genetic Algorithm (GA) to minimize the feature (Iqbal et al., 2019). Iqbal et al. (2019) have explained the feature selection method using GA by evaluating the fitness value with sentiment score whereas in the proposed model the fitness of each solution is evaluated using the accuracy score of each feature subsets. Support Vector Machine (SVM) (Preeti et al., 2020) is used to check the validity of the words concerning the label to find an effective solution. This evolutionary process of selecting the right element improves accuracy with increasing scalability. This customized method offers a 45% reduced feature set with better accuracy. In addition to demonstrating the feasibility of this proposed method, the authors conducted a detailed study with other mitigation strategies such as Principal Component Analysis (PCA) and Singular Value Decomposition (SVD). Using these two algorithms as a comparison, the authors obtained the proposed model results, which provides up to 14.5% increased accuracy over PCA and 16.2% increased accuracy over SVD through the Naïve Bayes learning process and this reduction feature strategies. As a comparison of the number of features of all three feature reduction strategies, the proposed method gives 13% better results compared to PCA and a 10% better result compared to SVD. With a small amount of variable set, the proposed system exceeds the other two algorithms.

The main contributions to the proposed work are as follows:

Key Terms in this Chapter

Chromosome: Set of parameters which is a suggested solution to the complication that Genetic Algorithm is trying to resolve.

Mutation: Mutations convert one or more genes from a chromosome from its original state. In the resolution of the solution, the solution may change completely from the previous solution.

Classification: This is the technique used to separate the categorical values on the basis of their positivity and negativity.

Crossover: One kind of genetic operator used to convert the chromosome from one generation to another. By doing so, high quality offspring can be collected.

Feature Optimization: The technique used towards the dimensionality reduction. As an optimized feature reduction, this will only select the features which have more impact on the target variable.

Fitness Calculation: This is the core part of the algorithm. It is an objective function which is used to find the optimal one. This will calculate the fitness to select new parents for mating.

Population: A bunch of attributes that converges towards the best solution with the certain iteration to take care of the issue.

Complete Chapter List

Search this Book:
Reset