Auto-Detection of Human Factor Contents on Social Media Posts Using Word2vec and Long Short-Term Memory (LSTM)

Auto-Detection of Human Factor Contents on Social Media Posts Using Word2vec and Long Short-Term Memory (LSTM)

Chika Yinka-Banjo (University of Lagos, Nigeria), Gafar Lekan Raji (University of Lagos, Nigeria) and Ifeanyi Precious Ohalete (Alex-Ekwueme Federal University of Ndufu-Alike, Ikwo, Nigeria)
DOI: 10.4018/978-1-7998-1279-1.ch001


The threat posed by cyberbullying to the mental health in our society cannot be overemphasized. Victims of this menace are reported to have suffered poor academic performance, depression, and suicidal thoughts. There is need to find an efficient and effective solution to this problem within the academic environment. In this research, one of the popular deep learning models—long short-term memory (LSTM)—known for its optimized performance in training sequential data was combined with Word2Vec embedding technique to create a model trained for classifying the content of social media post as containing cyberbullying content or otherwise. The result was observed to have shown improvements in its performance with respect to accuracy in the classification task with over 80% of the test dataset correctly classified as against the existing model with about 74.9% accuracy.
Chapter Preview


Cyberbullying (also referred to as cyber-victimization) is a term used to describe the action of individuals (The Bully) targeted at other individuals (The Bullied) to threaten, blackmail, embarrass, annoy or hurt the victim with the use of digital media or cyber-technology. Cyberbullying could also be defined as a deliberate use of some form of electronic technology to repeatedly pass out some bullying behavior. The advantage of social media has been well documented and most people have leverage the social media to their benefits. However, social media is often taken advantage of by some individuals (mostly young adults) to commit cybercrimes such as swindling other people, cyberbullying, and so on.

In order to curb the menace of cyberbullying, traditional mechanisms such as blacklisting some words and appointing individuals to cross-examine the content of posts were deployed to checkmate people’s behavior and how they engage others on social media. However, these mechanisms have not been effective on social networking sites due to the dynamic nature of the contents generated on the said social media.

(Chatzakou et al., 2017) opined that effectiveness of a cyberbullying detection system can be broken down into the following stages

  • 1.

    Filtering and Detecting bullying contents from messages within a tweet.

  • 2.

    Determining the severity of the bullying incident

  • 3.

    Identification of every individual involved

  • 4.

    Assignment of roles to each of the individual involved.

  • 5.

    Prediction of resulting event as a result of the cyberbullying incident

The effectiveness of such system is largely dependent on how effective it is able to filter and classify the contents of the tweets. This research proposes the use of long short term memory (LSTM) to effectively classify the contents of tweets as containing cyberbullying contents or otherwise.

We proposed to tackle the menace of cyberbullying especially on Twitter by casting it as a sentiment analysis problem which is a subset of natural language processing (NLP). In an effort to make our model all inclusive, we are working with a diverse corpus of tweets. Our dataset contains one million, six hundred thousand (1,600,000) tweets which depict various social interest such as business, travel, sports, racism, sexism, and so on. We adopted the use of Word2Vec for our word embedding technique.


Aim And Objectives

The aim of this research is to develop a model that can filter and effectively identify tweets containing cyberbullying contents.

The objectives are:

  • 1.

    To develop a model that would be able to check the presence of cyberbully content in messages before it gets posted

  • 2.

    To build a model that learns the dataset and effectively classify the content of new inputs as containing cyberbully content or otherwise, using Long Short Term Memory (LSTM)


Twitter has been said to have a telling impact on its users, especially teenagers and young adults. This is culminated in the number of cyberbully cases reported daily from this platform. The number of affected individuals is said to have been on the increase, thus drawing attention to its negative impact (Dani, Li, & Liu, 2017). Detection of online bullying and subsequent preventive measure is the main course of action to combat it.

Cyberbullying detection is getting a lot of research attention in the recent past due in part to the proliferation of social media and its detrimental effect on the mental health of young people. A recent study by (Agrawal & Awekar, 2018) revealed that between 10% - 40% of internet users have been victim of cyberbullying at one time or another.

Complete Chapter List

Search this Book: