Automatic Moderation of User-Generated Content

Automatic Moderation of User-Generated Content

Issa Annamoradnejad, Jafar Habibi
Copyright: © 2023 |Pages: 12
DOI: 10.4018/978-1-7998-9220-5.ch079
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In recent years, most popular websites, such as Facebook and Wikipedia, depend on user involvement and content generation for their popularity and growth. Due to the vast expanse of these systems in terms of users and posts, manual verification of new content by the administrators or official moderators is not feasible, and these systems require scalable solutions. In this research, the authors show the emerging need for automated moderation of user-generated content using the latest machine learning and data science methods. This article will present novel ideas to create a real-world recommendation system that can assist system users and moderators in identifying the existing issues of old questions, sharing new high-quality questions, reducing the time needed for performing moderation actions, and improving the overall quality of the system.
Chapter Preview
Top

Introduction

In recent years, most popular websites, such as Facebook and Wikipedia, emphasize user-generated content, ease of use, participatory culture, and interoperability for end-users (considered as Web 2.0). These websites do not generate systematic content and only depend on user involvement and content generation for their popularity and growth. Due to the vast expanse of these systems in terms of users and posts, manual verification of new content by the administrators or official moderators is not feasible and these systems require scalable solutions. The current strategy is to use crowdsourcing, which usually consists of initial reports by the community on the activities of users and a final decision by the official moderators or experienced users. For example, in community question-answering websites, if a post is against the general rules of the system, other users can flag the post for its violation using a reporting system, which will be entered into a review queue for further processing.

The crowdsourcing strategy has serious problems considering the agility and high impact of new posts (Ipeirotis et al., 2010; Paolacci et al., 2010). The slow handling process of reports is the first problem that exists in all of these systems. In general, the community has to review or simply read a new post, notice its unlawful content, create a flag report for its unlawful content, and wait some time for moderators to review the report that is now in a reporting queue. This process is performed manually by moderators and users, a costly and timely effort that sometimes results in subjective and biased decisions. In addition, some content may never be reported by the community as they are shared privately inside a closed network or not noticed by a large number of readers. This could lead to the usage of platforms as a safe private place for illegal activity, such as terrorism. Finally, users may wrongly report the content of another user because of disagreements and add to the slow handling of reports or cause problems for the target user (because of an automated locking mechanism in case of excessive reports for a user).

Given the need to maintain user rules and the significant problems of crowdsourcing, providing solutions for automatic and fast detection of user violations can resolve the mentioned problems, save time and money, reduce decision subjectivity, increase content quality, and create a safe place for civil debate. In addition, the same automated models can be utilized to develop constructive recommender systems that would help users in new content creation or editing.

In this research, by addressing these problems of manual handling, the authors show the emerging need for automated moderation of user-generated content using the latest machine learning and data science methods. Some recent works that proposed case-by-case solutions will be reviewed and a novel taxonomy of moderation actions will be provided by collecting answers to a new questionnaire. In addition, the authors propose an automated system for recommending the type of required edits to improve the content in a community Q&A website, such as Stack Overflow. Determining the type of required edits for a question would help the asker to fix the problems, reach more readers, and achieve answers to her questions. A more accurate question will help the readers to better understand the context of the problem and provide faster and more accurate answers to the question. Since the proposed approach only uses the question data and does not include previous user achievements or future community feedback on the question (such as upvotes and comments), it can be used as a recommender system for new users and question drafts. The model extracts features by three separate components of feature extraction, which will be fed to feature engineering steps. For the final classification task, the model is trained using a gradient boosting algorithm.

This chapter will present novel ideas to create a real-world recommendation system that can assist system users and moderators in identifying the existing issues of old questions, sharing new high-quality questions, reducing the time needed for performing moderation actions, and improving the overall quality of the system.

Key Terms in this Chapter

Community QA Websites: An online question-answering platform where users are able to ask or answer questions, contribute to improving other users’ public content or engage in moderation actions.

Explainable AI: A set of processes and methods that allows human users to comprehend and trust the results and output created by machine learning algorithms.

Fake News: False or misleading information presented as news, often to damage the reputation of a person or entity, or making money through advertising revenue.

Transfer Learning: A machine learning method where a model developed for a task is reused as the starting point for a model on a second task.

Web 2.0: Websites that emphasize user-generated content, ease of use, participatory culture and interoperability for end users.

Sock Puppetry: Forging multiple user accounts with real or fake identities to attack, promote or any kind of manipulation contrary to accepted practices of the community.

Cyber-Bullying: Any kind of bullying or harassment using electronic means commonly by posting rumors, threats, sexual remarks, or negative personal information about a victim or disrupting a civil conversation.

Complete Chapter List

Search this Book:
Reset