Spammer Group Detection Using Machine Learning Technology for Observation of New Spammer Behavioral Features

Li-Chen Cheng, Hsiao-Wei Hu, Chia-Chi Wu

Source Title: Journal of Global Information Management (JGIM) 29(2)

DOI: 10.4018/JGIM.2021030104

Article PDF Download Open access articles are freely available for download

Abstract

Recently, the rapid growth in the number of customer reviews on e-commence platforms and in the amount of user-generated content has begun to have a profound impact on customer purchasing decisions. To counter the negative impact of social media marketing, some firms have begun hiring people to generate fake reviews which either promote their own products or damage their competitor's reputation. This study proposes a framework, which takes advantage of both supervised and unsupervised learning techniques, for the observation of behaviors among spammers. Then, based on the behavior of participants on web forums, the authors build up a post-reply network. The main focus is on the behavior-related features of the reviews, their propagation, and their popularity. The primary objective of this study is to build an effective online spammer detection model and the method detailed in this work can be used to improve the performance of spammer detection models. An experiment is carried out with a real dataset, the results of which indicate that these new features are important for identifying spammers. Finally, random walk clustering is applied to investigate the post-reply network. Some interesting and important features are observed in the interactions between a group of spammers which could be subjected to further research.

Article Preview

Top

1. Introduction

The recent emergence of social media as a means of social communication has had profound effects on general communication structures and the interactions between businesses, communities and individuals. Social media gives organizations the opportunity to target a wider audience and establish connections within a short span of time using limited resources (Chen, De, & Hu, 2015). These changes have also meant that organizations now have to consider new ways of marketing their products and services (Trapp, 2016).

The development of social media has led to rapid growth in the amount of user-generated content which has not had a big impact on purchasing behavior, but affects the public perception of products/services, and thus the business development landscape. Naturally this had drawn the attention of researchers and marketers. Online consumer reviews have proven particularly influential in shaping the purchase decisions of potential customers. Positive reviews can ensure the success of a product while negative reviews can doom it to failure (Zhang, Zhou, Kehoe, & Kilic, 2016).

Most of the research on social media marketing has focused on the opportunities and advantages of these developments. Relatively little work has been done examining the negative ramifications (Shirish 2018). The negative impact of social media marketing is illustrated by a report appearing on BBC about the fake web reviews of Samsung products. The article made clear that Samsung was paying people to write negative reviews about HTC products on several web forums in Taiwan. This action was judged to violate fair trade practices and thus resulted in Samsung having to pay a 350 million USD to Taiwan’s Fair Trade Commission (FTC). The case only came to light in 2013, when a hacker released confidential marketing documents which they had obtained from Samsung Taiwan (Elmer-DeWitt, 2013).

It has been shown that this was not an isolated incident, that other firms, in efforts to cultivate a positive company image and improve sales, have taken steps to manufacture positive (fake) reviews of their products/services (Wang, Day, and Lin, 2016). In short, fake reviews are a growing problem that seriously undermines consumer trust in the review system.

Although these fake reviews are skillfully crafted to avoid detection, advances in machine learning technology are opening the door to automated detection (Jindal & Liu, 2008). Zhang, Zhou, Kehoe, and Kilic (2016) examined the predictive features that an automated system could use to detect which reviews are fake and which are not. They categorized these predictive features as either verbal or nonverbal. They defined verbal features as those extracted from the text of the review. Verbal features dominate the set of predictive features used in existing fake review detection models. In contrast, the nonverbal features are defined as the review posting behaviors and social interactions of reviewers with other reviewers on social media, especially on online review platforms. The focus in the detection of fake content has been on verbal focus features. Ott, Choi, Cardie, and Hancock (2011) built a prediction model using content-related features. Xie, Wang, Lin, and Yu (2012) focused upon identifying fake quantitative social information such as fake product rankings and ratings. Mukherjee, Liu, and Glance (2012) carried out experiments to identify fake review data which had been posted on Yelp. Past studies have proven that it is very hard to detect spammers (in this case the people who write fake reviews) simply by reviewing the content features because of the subtle way such reviews (opinions) are produced. This has motivated many researchers to strive to develop machine-learning methods which can be applied to examine the nonverbal aspects of posted reviews based on the reviewers’ behavior-related characteristics (Lim, Nguyen, Jindal, Liu, & Lauw, 2010; Li, Huang, Yang, & Zhu, 2011). For example, it has been found that fake reviews can be distinguished by their temporal patterns (Xie et al., 2012).

According to Mukherjee, Venkataraman et al. (2013b), behavioral features are far more effective than linguistic n-grams in terms of detection performance. When examining nonverbal features, it is important to observe patterns in the way spammers work. The earnings of spammers are usually based on the number of reviews they post. Thus, many of the fake reviews they produce (in particular, replies) do not necessarily even express an opinion about the product under discussion. It is often the case that fake review posts are meant only to keep the discussion alive or attract attention to the threads pertaining to the objectives of their campaign.

Complete Article List

Search this Journal:

Reset

Volume 32: 1 Issue (2024)

Volume 31: 9 Issues (2023)

Volume 30: 12 Issues (2022)

Volume 29: 6 Issues (2021)

Volume 28: 4 Issues (2020)

Volume 27: 4 Issues (2019)

Volume 26: 4 Issues (2018)

Volume 25: 4 Issues (2017)

Volume 24: 4 Issues (2016)

Volume 23: 4 Issues (2015)

Volume 22: 4 Issues (2014)

Volume 21: 4 Issues (2013)

Volume 20: 4 Issues (2012)

Volume 19: 4 Issues (2011)

Volume 18: 4 Issues (2010)

Volume 17: 4 Issues (2009)

Volume 16: 4 Issues (2008)

Volume 15: 4 Issues (2007)

Volume 14: 4 Issues (2006)

Volume 13: 4 Issues (2005)

Volume 12: 4 Issues (2004)

Volume 11: 4 Issues (2003)

Volume 10: 4 Issues (2002)

Volume 9: 4 Issues (2001)

Volume 8: 4 Issues (2000)

Volume 7: 4 Issues (1999)

Volume 6: 4 Issues (1998)

Volume 5: 4 Issues (1997)

Volume 4: 4 Issues (1996)

Volume 3: 4 Issues (1995)

Volume 2: 4 Issues (1994)

Volume 1: 4 Issues (1993)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Spammer Group Detection Using Machine Learning Technology for Observation of New Spammer Behavioral Features

Abstract

1. Introduction

Complete Article List