Extracting Hierarchy of Coherent User-Concerns to Discover Intricate User Behavior from User Reviews

Extracting Hierarchy of Coherent User-Concerns to Discover Intricate User Behavior from User Reviews

Ligaj Pradhan (University of Alabama at Birmingham, Birmingham, AL, USA), Chengcui Zhang (Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL, USA) and Steven Bethard (University of Arizona, Tucson, AZ, USA)
DOI: 10.4018/IJMDEM.2016100104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Intricate user-behaviors can be understood by discovering user interests from their reviews. Topic modeling techniques have been extensively explored to discover latent user interests from user reviews. However, a topic extracted by topic modelling techniques can be a mixture of several quite different concepts and thus less interpretable. In this paper, the authors present a method that uses topic modeling techniques to discover a large number of topics and applies hierarchical clustering to generate a much smaller number of interpretable User-Concerns. These User-Concerns are further compared with topics generated by Latent Dirichlet Allocation (LDA) and Pachinko Allocation Model (PAM) and shown to be more coherent and interpretable. The authors cut the linkage tree formed while performing the hierarchical clustering of the User-Concerns, at different levels, and generate a hierarchy of User-Concerns. They also discuss how collaborative filtering based recommendation systems can be enriched by infusing additional user-behavioral knowledge from such hierarchy.
Article Preview

Introduction

Users write reviews about items regarding various aspects that are important to them while writing the review. In this paper we refer to this as ‘User-Concerns’. Such concerns could be some aspects of the item itself or some aspects peculiar to the users. Various techniques such as topic modeling have been used to discover such hidden information by discovering topics from user reviews (Bauman & Tuzhilin, 2014; Huang, 2014; Shenoy & Aras, 2013). Discovering such User-Concerns could be vital for understanding what the user looks for in the item. Similarly, arranging such User-Concerns into hierarchies capturing their interrelationships may allow us to visualize closeness between various User-Concerns and the users having these User-Concerns. As such, in this paper, we aim to automatically discover, label such User-Concerns, and arrange them into hierarchies allowing us to visualize their interrelationships and relative distances.

Probabilistic Topic Models (PTMs) such as Latent Dirichlet Allocation (LDA) is a basic and widely used technique to discover hidden topics in documents (Blei, Ng, & Jordan, 2003). PTMs generally consider documents (reviews) to contain several topics. Each topic appears in different proportions in each review. Similarly, each topic is represented by a distribution of all the words in the reviews set at various proportions. Such models can generate a set of topics for user reviews that can capture hidden themes for each review and their proportions. Hierarchical variants of such probabilistic topic models such as Pachinko Allocation Model (PAM) also generate a nested hierarchy of topics (Li & McCallum, 2006). Although the discovered topic word distribution may be intuitively meaningful, it might be challenging to accurately interpret the meaning of each topic (Mei, Shen, & Zhai, 2007). Moreover, when the number of topics to be generated is small, PTMs are forced to fit all the concepts into a small set of topics, which tends makes the topics too general (Zavitsanos, Paliouras, Vouros, & Petridis, 2007). Hence PTMs might not be very suitable for applications that intend to uncover a small number of clear and coherent User-Concerns from documents such as user reviews, as the generated topics tend to be rather noisy and less coherent because of mixed knowledge. In order to capture clearer topics, PTMs require a larger number of topics, which will be able to capture finer grained and more focused concepts present in the user reviews. However, such an elaborate list of topics might be of greater inconvenience to a human expecting to quickly understand generic User-Concerns in the available user reviews. Hence, in this paper we aim to discover a relatively small number of User-Concerns that are generic and at the same time clear and coherent compared to that produced by conventional PTMs. We start by exploiting LDA to generate a large number of topics, e.g., 200. Then we use the word mover’s distance (Kusner, Sun, Kolkin, & Weinberger, 2015) to compute semantic distances between the generated topics and perform an agglomerative clustering to generate a small number of clusters, e.g., 15. We also train a word2vec model to generate a vector of 200 elements for each word in the review set available to us. Finally, we represent each such cluster with a few central words, i.e., the words that are nearest to the cluster centroid computed using the trained word2vec model. Each such group of words represents a User-Concern, similar to a topic generated by PTMs. We compare the User-Concerns discovered using our approach with that discovered by standard LDA and a hierarchical topic modeling approach called PAM. Our experimental results show that the User-Concerns discovered by using our methods are more human interpretable and semantically more coherent.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 9: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing