Estimating Importance From Web Reviews Through Textual Description and Metrics Extraction

Estimating Importance From Web Reviews Through Textual Description and Metrics Extraction

Roney Lira de Sales Santos, Carlos Augusto de Sa, Rogerio Figueredo de Sousa, Rafael Torres Anchiêta, Ricardo de Andrade Lira Rabelo, Raimundo Santos Moura
Copyright: © 2021 |Pages: 26
DOI: 10.4018/978-1-7998-4240-8.ch007
OnDemand:
(Individual Chapters)
Available
$33.75
List Price: $37.50
10% Discount:-$3.75
TOTAL SAVINGS: $3.75

Abstract

The evolution of e-commerce has contributed to the increase of the information available, making the task of analyzing the reviews manually almost impossible. Due to the amount of information, the creation of automatic methods of knowledge extraction and data mining has become necessary. Currently, to facilitate the analysis of reviews, some websites use filters such as votes by the utility or by stars. However, the use of these filters is not a good practice because they may exclude reviews that have recently been submitted to the voting process. One possible solution is to filter the reviews based on their textual descriptions, author information, and other measures. This chapter has a propose of approaches to estimate the importance of reviews about products and services using fuzzy systems and artificial neural networks. The results were encouraging, obtaining better results when detecting the most important reviews, achieving approximately 82% when f-measure is analyzed.
Chapter Preview
Top

Introduction

Nowadays, a web user has a common practice to search for reviews when there is an interest in purchasing a product or service. Also, the companies that manufacture products or provide services are interested in customers opinions or feedback, mainly to guide marketing actions and decision-making process.

One of the main places of this kind of data is e-commerce, which includes sites for buying and selling products and providing services. E-commerce is one of the main activities present on the internet, in which exceeding the mark of 12 million stores around the planet (Digital Commerce 360, 2014). NLP researches area have tried to extract useful data from unstructured data, as around 95% of relevant information originates in an unstructured way, mainly texts such as emails, surveys, posts on social networks and forums, among others, and every day 2.5 quintillion of bytes of data are created, so much so that 90% of the data in the world today was created only in the last two years (Santos et al., 2015). This large amount of data makes manual analysis an impossible task, requiring the creation of automatic methods to analyze the data (Liu, 2010).

According to Liu (2010), this interest has always existed. However, considering the growing of data on the web, there is another way of sending opinions and making information available. Due to web popularization, people and companies have had new ways to deliver and collect opinions. Recently, social networks showed an increase in supply available places to store the content generated by customers about some products or service. Thus, consumer reviews are important to the success or failure of a product or service, because a satisfied customer will probably make a positive comment about some product that was purchased to close people, while a not satisfied customer will do a negative review.

Since there are a large number of reviews published by users, the reviews are usually classified by stars, most recent or most relevant, but are not always the most important or useful opinions for a particular user. On some buying and selling websites, users can vote on reviews that they consider useful or useless when they are searching for a product or service. However, not always only polarity information from the review is sufficient, as other problems may happen, as highlighted by Li et al., (2013): newer reviews that have not been voted yet will be hard to read and voted on. Thus, providing the most important reviews, based on the textual description, the richness of the vocabulary, and the quality of the author are factors that must be considered. In this way, new users can analyze a small set of reviews for decision-making.

Sousa et al. (2015) approach presented one possible solution to such problems, by filtering the reviews based on some features such as author reputation and textual description measures. In the end, their approach estimates the importance degree of reviews about products or services that were written by web users, allowing the knowledge of which reviews were most relevant to the user’s final evaluation. Their work also used some Natural Language Processing (NLP) techniques and Fuzzy Systems (FS). Thus, the main aim of this paper is to present a study with an approach to estimate the importance degree of reviews, using some NLP techniques but made some changes in the computational model to Artificial Neural Network (ANN). Moreover, this work proposes adaptations in two input variables proposed by Sousa et al. (2015): author reputation and vocabulary richness.

This work has as contributions:

  • Creation of a corpus manually annotated, to be used in the experiments of this work and others with the same approach or intention;

  • Measures to define author reputation and vocabulary richness;

  • Ease for user and companies in filtering reviews based not only on utility and more recent but from textual description;

Key Terms in this Chapter

Importance Degree: A measure to define which object of study is more important than other one.

Review: A description of user’s opinion about some product or service and it should be positive, negative, or neutral.

Vocabulary Richness: The measure of correctness and lexical variety of the sentence or document.

Neural Network: A set of algorithms based on human brain usually used to recognize patterns.

Opinion Mining: The task of identify and extract subjective information in texts.

Helpfulness: A quality of being helpful in some subject.

Corpus: A dataset about some specific subject. Widely used in comptational linguistic area.

Complete Chapter List

Search this Book:
Reset