Sentiment Analysis of the Harry Potter Series Using a Lexicon-Based Approach

Sentiment Analysis of the Harry Potter Series Using a Lexicon-Based Approach

Md Habib Al Mamun, Pantea Keikhosrokiani, Moussa Pourya Asl, Nur Ain Nasuha Anuar, Nurfarah Hadira Abdul Hadi, Thasnim Humida
DOI: 10.4018/978-1-7998-9594-7.ch011
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The objective of this chapter is to conduct a sentiment analysis of the Harry Potter novel series written by British author J.K. Rowling. The text of the series is collected from GitHub as an R package provided by Bradley Boehmke. The chapter analyzed the text by R programming to explore dominant sentiments using a lexicon approach of natural language processing (NLP). The results revealed that Professor Slughorn scored the most positive sentiment among the main characters that have heroic qualities; Death Eaters had the most negative sentiment among the anti-hero characters; negative sentiment in the text around the anti-hero characters increased significantly, while the positive sentiment around the hero characters remained constant as the story progressed throughout the series; among the series of novels, The Deathly Hallows contained the most negative sentiment; among all the houses of Hogwarts School of Witchcraft and Wizardry, Hufflepuff had the most positive sentiment; and each book of the series appeared negative until the final chapter, which always ended with a positive sentiment.
Chapter Preview
Top

Introduction

Computational analysis of literary works is still considered as a big challenge in the field of digital literary studies, computational linguistics, machine learning, and neurocognitive poetics (Jacobs, 2015; Nalisnick & Baird, 2013; Ying et al., 2021, 2022). Sentiment analysis is a flourishing area which intersects linguistics and computer science. Sentiment Analysis is used to discover the sentiment contained in a text that can be assessed as positive or negative (Malik et al., 2021; Taboada, 2016). Sentiment Analysis is the key challenge that can assess the emotional information encoded in a literary text. Although, over the last two decades, a remarkable progress has been shown in Sentiment Analysis (Liu, 2015). The progression has occurred mostly in social media for business purposes, but few research can be found in literary works. Arthur M. Jacobs studied on sentiment analysis of poetic texts such as Shakespeare’s sonnets where he focused on predicting aesthetic emotions (Jacobs et al., 2017). He also carried out Sentiment Analysis of novels such as Harry Potter book series and computed emotional and personality profiles of the protagonists (Jacobs, 2019). However, this type of work is rarely seen in the research domain of Sentiment Analysis. The authors have chosen this research topic considering the lack of research of Sentiment Analysis in digital literary studies.

In general, two approaches have been found for extracting sentiment automatically; a) lexicon based approach that is unsupervised, and b) machine learning approach that is supervised (Taboada et al., 2011). Both approaches of Sentiment Analysis have their pros and cons. Lexicon-based approaches have a benefit over machine learning approach as they do not require labeled data for forecasting unseen instances (Sazzed & Jayarathna, 2021). Natural Language Processing (NLP) is a technique that uses lexicon based approach to classify the sentiment polarity from the text using a sentiment lexicon (Nasukawa & Yi, 2003). Machine learning approach entails constructing classifiers from the text which are usually labeled (Pang et al., 2002), while lexicon-based approach computes the orientation from words or phrases in a document (Turney, 2002).

The study employed lexicon-based approach for extracting sentiment from Harry Potter series. Although varieties of dictionaries exist to evaluate the sentiment or opinion from texts, the study uses three of them: AFINN, BING, and NRC. The text of the series was collected from GitHub as an R package provided by Bradley Boehmke. The study depicted three categories of character (Main character, Hero character and Villain Character) from the data set of Harry Potter book series for conducting the Sentiment Analysis. The study has proposed a framework which was adopted from knowledge discovery in databases (KDD) method. The study used R programming language for analyzing and preprocessing the texts of the Harry Potter dataset. As a fundamental requirement of text-mining, some tidy data tools have been used in this analysis.

The objective of this paper is to propose a framework to conduct a Sentiment Analysis of Harry Potter book series written by British author J.K. Rowling. The paper will extract sentiment from various contexts such as a) frequency of words in the Harry Potter series, b) frequency of characters, c) prominent characters in each book, d) frequency of Sentimental words, e) Sentiment Analysis for each book, f) character-based Sentiment Analysis, g) sentiment of Hogwarts Houses, h) sentiment of Harry Potter series by page.

Complete Chapter List

Search this Book:
Reset