Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Natural Language Processing in Online Reviews

Gunjan Ansari, Shilpi Gupta, Niraj Singhal

Source Title: Natural Language Processing for Global and Local Business

DOI: 10.4018/978-1-7998-4240-8.ch003

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The analysis of the online data posted on various e-commerce sites is required to improve consumer experience and thus enhance global business. The increase in the volume of social media content in the recent years led to the problem of overfitting in review classification. Thus, there arises a need to select relevant features to reduce computational cost and improve classifier performance. This chapter investigates various statistical feature selection methods that are time efficient but result in selection of few redundant features. To overcome this issue, wrapper methods such as sequential feature selection (SFS) and recursive feature elimination (RFE) are employed for selection of optimal feature set. The empirical analysis was conducted on movie review dataset using three different classifiers and the results depict that SVM could achieve f-measure of 96% with only 8% selected features using RFE method.

Chapter Preview

Top

Introduction

With the rise of various e-commerce sites, 72% buyers rely on online reviews before purchasing any product or service. Online review statistics show that 85% of consumers prefer to buy products from sites with reviews and users trust 12 times more on customer reviews than description given by product manufacturers. Reviews are the third most significant factor used for the ranking of e-commerce sites by Google. Facebook reviews statistics reveal that every four out of five users rely on local business having positive reviews. However, one negative review may adversely impact 35% of customers. Twitter statistics showed that the reviews shared through tweets in 2019 increased the sale by 6.46% on e-commerce sites (Galov et al.,2020).

With the remarkable rise in the social media content in the past few years, there arises a need to analyze this online data to enhance user’s experience which will further lead to an improvement in the local and global business of the e-commerce sites. Due to the availability of annotated datasets of product, movie, restaurant, reviews, etc. the researchers are developing various supervised learning approaches in recent years for extracting useful patterns from the online content. Although the supervised learning approaches are found to be quite useful, they suffer from the curse of dimensionality due to the generation of ample feature space from the vast amount of online content. The selection of relevant and non-redundant features from the extracted features have shown to achieve promising results in terms of accuracy and time.

The chapter will provide a theoretical and empirical study of different filter (Yang & Pederson,1997; Chandrashekhar & Sahin, 2014) and wrapper (Zheng et al.,2003) based feature selection methods for improving classification. The filter-based feature selection methods rank each feature based on the correlation between the feature and the class using various statistical tests. The top-ranked features are then selected for training the classification model. However, the filter-based methods are computationally fast; they result in the selection of redundant features. To overcome this drawback, wrapper-based feature selection methods such as Recursive Feature Elimination and Sequential Feature Selection are employed in this study. They evaluate each feature subset based on its performance on the classifier. The selected features in wrapper methods are more relevant and non-redundant as compared to filter methods, thus leading to better performance of the classifier.

The first section of the chapter will introduce elementary Natural Language Processing (NLP) tasks related to online review classification. An insight into a few tools used for scraping data (Mitchell, 2015) from online review sites will be covered in this section. The reviews posted on these sites are generally noisy and contain misspelt words, abbreviations etc. To handle these issues, pre-processing of reviews (Kowsari et al.,2019) is required which convert raw data into an appropriate format for the implementation of the machine learning model. Few parsing techniques such as Parts-of-Speech (PoS) tagging and dependency parsing are the primary tasks required for extracting opinion from the review in applications such as Sentiment Analysis (Liu, 2012), Named entity recognition (Hanafiah & Quix, 2014) etc.

After pre-processing of reviews, there is a need to represent each review document into a learning vector for designing any machine learning model. The section will also provide a review of elementary feature representation models used in various applications of text classification (Ahuja et al., 2019) such as Term-Frequency (TF) or Bag-of-Words (BoW) and Term Frequency- Inverse Document Frequency (TF-IDF) (Qaiser & Ali, 2018). However, these schemes are easy to implement; their negative aspect is that they ignore the position of feature and its semantic relationship with other features in the given review document. This issue can be resolved by using the model (Uchida et al.,2018) that converts document of the given corpus into low dimensional embedding vector using deep learning and neural networks-based techniques. The Doc2vec model for representing feature vectors will also be covered in the section.

Key Terms in this Chapter

Unsupervised Learning: In unsupervised machine learning algorithms, the model learns from unlabeled data instances by finding the similarity or association between them.

Filter-Based Feature Selection: It filters irrelevant features from the extracted features on the basis of their association with the output class.

Wrapper-Based Feature Selection: This method selects the most useful and non-redundant features from the extracted features on the basis of their performance on the classifier.

Supervised Learning: It is machine learning algorithm in which the model learns from ample amount of available labeled data to predict the class of unseen instances.

Deep Learning: It is a subarea of machine learning, where the models are built using multiple layers of artificial neural networks for learning useful patterns from raw data.

Feature Selection: It is used to select appropriate features from the available data for improving efficiency of machine learning algorithms.

Semi-Supervised Learning: It is a machine learning algorithm in which the machine learns from both labeled and unlabeled instances to build a model for predicting the class of unlabeled instances.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Natural Language Processing in Online Reviews

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List