Traditional Classifiers vs. Deep Learning for Cyberbullying Detection

Traditional Classifiers vs. Deep Learning for Cyberbullying Detection

DOI: 10.4018/978-1-5225-5249-9.ch006
OnDemand PDF Download:
No Current Special Offers


In this chapter, the authors present their approach to cyberbullying detection with the use of various traditional classifiers, including a deep learning approach. Research has tackled the problem of cyberbullying detection during recent years. However, due to complexity of language used in cyberbullying, the results obtained with traditional classifiers has remained only mildly satisfying. In this chapter, the authors apply a number of traditional classifiers, used also in previous research, to obtain an objective view on to what extent each of them is suitable to the task. They also propose a novel method to automatic cyberbullying detection based on convolutional neural networks and increased feature density. The experiments performed on actual cyberbullying data showed a major advantage of the presented approach to all previous methods, including the two best performing methods so far based on SO-PMI-IR and brute-force search algorithm, presented in previous two chapters.
Chapter Preview

Proposed Methods

Below we describe the details of the applied methods. Firstly, we describe basics of data preprocessing and feature extraction. Next, we shortly explain all classifiers with their settings and modification applied in the experiments, including the proposed model based on CNN.

Data Preprocessing

The sentences from the original dataset used in this (Ptaszynski et al., 2010, 2015a, 2015b, 2016; Nitta et al., 2013) were preprocessed in the following ways:

  • Tokenization: All words, punctuation marks, etc. are separated by spaces (later: TOK).

  • Lemmatization: Like the above but the words are represented in their generic (dictionary) forms, or “lemmas” (later: LEM).

  • Parts of Speech: Words are replaced with their representative parts of speech (later: POS).

  • Tokens With POS: Both words and POS information is included in one element (later: TOK+POS).

  • Lemmas With POS: Like the above but with lemmas instead of words (later: LEM+POS).

  • Tokens With Named Entity Recognition: Words encoded together with with information on what named entities (private name of a person, organization, numericals, etc.) appear in the sentence. The NER information is annotated by CaboCha (later: TOK+NER).

  • Lemmas With NER: Like the above but with lemmas (later: LEM+NER).

  • Chunking: Larger sub-parts of sentences separated syntactically, such as noun phrase, verb phrase, predicates, etc., but without dependency relations (later: CHNK).

  • Dependency Structure: Same as above, but with information regarding syntactical relations between chunks (later: DEP).

  • Chunking With NER: Information on named entities is encoded in chunks (later: CHNK+NER).

  • Dependency Structure With Named Entities: Both dependency relations and named entities are included in each element (later: DEP+NER).

Five examples of preprocessing were represented in Table 2 in Chapter 5. Theoretically, the more generalized a sentence is, the less unique and frequent patterns it will contain, but the produced patterns will be more frequent. For example, in the sentence from Table 2 in Chapter 5 we can see that a simple phrase kimochi_ii hi (“pleasant day”) is represented by a POS pattern as ADJ N. We can easily assume that there will be more ADJ N patterns than kimochi_ii hi, because many word combinations can be represented by this pattern. We compared the results of classification for each classifier using different preprocessing methods to find out whether it is better to represent sentences as more generalized or as more specific. The generalization is also closely related to the notion of Feature Density we propose to optimize the proposed method.

Complete Chapter List

Search this Book: