Discovering Attribute-Specific Features From Online Reviews: What Is the Gap Between Automated Tools and Human Cognition?

Discovering Attribute-Specific Features From Online Reviews: What Is the Gap Between Automated Tools and Human Cognition?

Xiaonan Jing (Purdue University, West Lafayette, USA), Penghao Wang (Purdue University, West Lafayette, USA) and Julia M. Rayz (Purdue University, West Lafayette, USA)
DOI: 10.4018/IJSSCI.2018040101


This article describes how online reviews play an important role in data driven decision making. Many efforts have been invested in determining the overall sentiment carried by the reviews. However, oftentimes, the overall ratings of the reviews do not represent opinions toward specific attributes of a product. An ideal opinion mining tool should aim at finding both the product attributes and their corresponding opinions. The authors propose an approach for extracting the attribute specific features from online reviews using a Word2Vec model combined with clustering. Two experiments are described in this paper: the first focuses on testing the performance of the Word2Vec model on extracting product aspect words, the second addresses how well the extracted features obtained are recognizable by human cognition. A new metric called the “split value” that is based on cluster similarity and diversity is introduced to examine the consistency of clustering algorithm. The authors' experiments suggest that meaningful clusters, which provide insights to the product attributes and sentiments, could be extracted from the reviews.
Article Preview

1. Introduction

The adoption of Web 2.0 triggered the development of many review websites. The term Web 2.0, defined as “Web as Platform” by Tim O’Reilly (2005) suggests the digital era, where knowledge and information sharing are highly activated, has shaped the way of businesses (Erragcha & Romdhane, 2014). Online reviews have become the key to evaluate products as many believe that “a product’s true assessed value is the result of consumer opinion often conveyed via word of mouth.” (Kannan, Goyal, & Jacob, 2013). Customer reviews not only help businesses learn the strengths and weaknesses of their products, but also help users compare and make decisions between products. Furthermore, review websites such as Yelp and TripAdvisor provide detailed product evaluations including attribute-specific ratings. The large amount of data makes it difficult for idea seekers to assemble the information of their interests and make the right decisions. The general strategies that most people use are to assess the statistics of the overall ratings of the reviews or to read several reviews voted to be the most helpful ones. However, neither of the strategies benefits from a conclusion drawn from the contents of all the data that are available. The emerging needs of businesses defined the future digital world to be Web 3.0, which aims to create meanings and interpretations from the vast amount of human-generated data shared under Web 2.0 (Barassi & Trere, 2012). In order to cope with future business needs and to maximize the business value of online reviews, a tool that could perform thorough analysis examining users’ opinions towards a large amount of reviews is needed.

Opinion mining -- also referred to as sentiment analysis -- has been studied on different levels, with many approaches discovering document-level or sentence-level opinions recorded in the corpora. However, these approaches are not sufficient for corpus that needs phrase-level analysis or document that contains diverse opinions in a sentence (Liu & Zhang, 2012). In the case of online reviews, the overall rating does not necessarily reveal the writer’s opinions of specific aspects mentioned in a review. For instance, a highly rated product could have small flaws described, which would not be discovered unless the phrase-level studies are performed. In order to obtain aspect-based opinions, product attribute words and their corresponding sentiments need to be extracted and classified from the online reviews, which serve as the first step for aspect-based sentiment analysis. The quality of the features extracted in this step could largely affect the subsequent analysis on determining the opinions. Therefore, for opinion mining tools to deduce accurate results, they should rely on high-quality features that are extracted from raw text.

The purpose of this study is to test state-of-the-art methods for extracting attribute-specific features from online reviews. Based on the results reported in many recent papers, we hypothesized that the Word2Vec model (Mikolov, Sutskever, Chen, Corrado, & Dean, 2013a) could be useful for extracting feature words for aspect-based opinion mining tasks. Word2Vec model is the state-of-the-art method for learning word embeddings from text data. The name Word2Vec does not represent a single algorithm, it instead refers to the process for representing words in a vector space. Mikolov et al. (2013a) presented two models, the Continuous Bag of Words (CBOW) model which uses the context words to predict the target words, and the Skip-gram model which uses the target word to predict the surrounding words. These models enabled vector arithmetic between word vectors for discovering analogical relationship between word pairs. For instance, the pairs of words that resemble similar relationship with “France” and “Paris” (which were learned by the skip-gram model using Google news dataset) were found to be “Italy” and “Rome”, “Japan” and “Tokyo”, etc., as reported by Mikolov et al. (2013b).

This paper focuses on the performance of the Word2Vec model for extracting different aspect words and the possibilities of the model for generating cognitively meaningful word clusters. The goal is to introduce an effective and accurate approach for extracting high-quality features from online reviews, which could provide insights for future web intelligence. We are also interested in comparing the results received by a computational method to human judgments.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 10: 4 Issues (2018): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing