Web Mining-Based Method for Cyberbullying Detection

Web Mining-Based Method for Cyberbullying Detection

DOI: 10.4018/978-1-5225-5249-9.ch004

Abstract

In this chapter, the authors present a method for automatic detection of cyberbullying entries based on a Web mining technique, in particular, on an extended SO-PMI-IR method calculating relevance of new input documents with training documents. The method uses seed words from three categories to calculate semantic orientation score and then maximizes the relevance of categories. The method outperformed previously proposed Web-mining-based methods in both laboratory and real-world conditions. The developed system is deployed and tested in practice. After a year of testing, the authors notice an over 30% point drop in its performance. They hypothesize on the reasons for the drop. To regain the lost performance and sustain it in the future, the authors propose additional improvements including automatic acquisition and filtering of seed words. Experimentally selected optimal improvements regained much of the lost performance.
Chapter Preview
Top

Introduction

Web-mining-based methods have been widely acknowledged in document classification, particularly in sentiment and affect analysis (Turney, 2002; Pang & Lee, 2008; Ptaszynski et al., 2009), even in such novel fields as machine ethics (Komuda et al., 2010). Their advantage lies mostly in their high effectiveness and, most of all, efficiency. Web-mining methods usually require minimal training data to perform a mass-level search on available Web data, such as the Internet (Ptaszynski et al., 2009), or large scale local corpora (Komuda, Rzepka & Araki, 2013) due to minimal human effort needed to achieve satisfactory results.

Here, we present our attempt of applying a Web mining method to the detection of cyberbullying. The reason for this attempt was backed by the main goal of our research, namely, to provide help to the members of Internet Patrol, which have been carried out mostly manually. It takes much time and effort to find harmful entries (entries that contain information and expressions aimed to harm other users) in a large amount of contents appearing on countless numbers of electronic Bulletin Board (BBS) pages. Moreover, the task comes with a great psychological burden on mental health to the net-patrol members.

To solve the above problem and decrease the burden of net-patrol members, we firstly proposed a preliminary method to detect harmful entries automatically (Matsuba et al., 2011). In this method we extended the method of relevance calculation SO-PMI-IR, developed by Turney (2002) to the relevance of a document with harmful contents. With the use of a small number of seed words we were able to detect large numbers of document candidates for harmful entries with satisfying effectiveness.

The initial method was demonstrated to determine harmful entries with an accuracy of 83% on test data for which about a half contained harmful entries. However, it was not verified how well the method would perform in real life conditions, where cyberbullying entries and normal contents do not appear in equal numbers.

In the research described in this chapter, initially introduced by Nitta et al. (2013) in 2013, we firstly proposed an optimization of the original method by maximizing the relevance scores. In the method we divided the seed words into multiple categories and calculated maximal relevance values for each seed word from each category. By calculating the maximized score, representing semantic orientation of harmfulness of an Internet entry, the method detected harmful entries more effectively than in the previous attempts. Moreover, we additionally evaluated our method on data sets with real-life ratios of harmful contents to verify the usability of the method in the most realistic way.

After confirming the performance, the method was deployed and tested “in the field” for over one year. Unfortunately, after re-evaluating the method after that time we found out that the performance significantly dropped by losing over 30-percentage-points when compared to its original performance measured in 2013. In the original evaluation process, applied also in this research and previously in Matsuba et al. (2011), Nitta et al. (2013) assigned the harmfulness score to all sentences from their dataset containing harmful and non-harmful entries, sorted all entries decreasingly and checked the performance looking at the top n entries, while increasing the number of n (further referred to as “threshold window”) by 50 entries at a time. This way they were able to witness the fluctuation in performance when more data is applied in the evaluation. As the evaluation measures they used standard Precision and Recall. In particular, the originally confirmed high performance was on a level of 91% when it comes to Precision for low Recall values. Moreover, original performance from late 2013 was retained high (around 70%) even close to 50% of Recall, after which it decreased due to drop in Precision for higher thresholds. However, when the evaluation experiment was repeated in early 2015, the overall results greatly dropped. Precision did not exceed 60% for the whole threshold span. This made us introduce a number of further improvements to regain the lost performance and assure sustaining it on a sufficient level.

Complete Chapter List

Search this Book:
Reset