Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Web Mining-Based Method for Cyberbullying Detection

Source Title: Automatic Cyberbullying Detection: Emerging Research and Opportunities

DOI: 10.4018/978-1-5225-5249-9.ch004

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In this chapter, the authors present a method for automatic detection of cyberbullying entries based on a Web mining technique, in particular, on an extended SO-PMI-IR method calculating relevance of new input documents with training documents. The method uses seed words from three categories to calculate semantic orientation score and then maximizes the relevance of categories. The method outperformed previously proposed Web-mining-based methods in both laboratory and real-world conditions. The developed system is deployed and tested in practice. After a year of testing, the authors notice an over 30% point drop in its performance. They hypothesize on the reasons for the drop. To regain the lost performance and sustain it in the future, the authors propose additional improvements including automatic acquisition and filtering of seed words. Experimentally selected optimal improvements regained much of the lost performance.

Chapter Preview

Top

Introduction

Web-mining-based methods have been widely acknowledged in document classification, particularly in sentiment and affect analysis (Turney, 2002; Pang & Lee, 2008; Ptaszynski et al., 2009), even in such novel fields as machine ethics (Komuda et al., 2010). Their advantage lies mostly in their high effectiveness and, most of all, efficiency. Web-mining methods usually require minimal training data to perform a mass-level search on available Web data, such as the Internet (Ptaszynski et al., 2009), or large scale local corpora (Komuda, Rzepka & Araki, 2013) due to minimal human effort needed to achieve satisfactory results.

Here, we present our attempt of applying a Web mining method to the detection of cyberbullying. The reason for this attempt was backed by the main goal of our research, namely, to provide help to the members of Internet Patrol, which have been carried out mostly manually. It takes much time and effort to find harmful entries (entries that contain information and expressions aimed to harm other users) in a large amount of contents appearing on countless numbers of electronic Bulletin Board (BBS) pages. Moreover, the task comes with a great psychological burden on mental health to the net-patrol members.

To solve the above problem and decrease the burden of net-patrol members, we firstly proposed a preliminary method to detect harmful entries automatically (Matsuba et al., 2011). In this method we extended the method of relevance calculation SO-PMI-IR, developed by Turney (2002) to the relevance of a document with harmful contents. With the use of a small number of seed words we were able to detect large numbers of document candidates for harmful entries with satisfying effectiveness.

The initial method was demonstrated to determine harmful entries with an accuracy of 83% on test data for which about a half contained harmful entries. However, it was not verified how well the method would perform in real life conditions, where cyberbullying entries and normal contents do not appear in equal numbers.

In the research described in this chapter, initially introduced by Nitta et al. (2013) in 2013, we firstly proposed an optimization of the original method by maximizing the relevance scores. In the method we divided the seed words into multiple categories and calculated maximal relevance values for each seed word from each category. By calculating the maximized score, representing semantic orientation of harmfulness of an Internet entry, the method detected harmful entries more effectively than in the previous attempts. Moreover, we additionally evaluated our method on data sets with real-life ratios of harmful contents to verify the usability of the method in the most realistic way.

After confirming the performance, the method was deployed and tested “in the field” for over one year. Unfortunately, after re-evaluating the method after that time we found out that the performance significantly dropped by losing over 30-percentage-points when compared to its original performance measured in 2013. In the original evaluation process, applied also in this research and previously in Matsuba et al. (2011), Nitta et al. (2013) assigned the harmfulness score to all sentences from their dataset containing harmful and non-harmful entries, sorted all entries decreasingly and checked the performance looking at the top n entries, while increasing the number of n (further referred to as “threshold window”) by 50 entries at a time. This way they were able to witness the fluctuation in performance when more data is applied in the evaluation. As the evaluation measures they used standard Precision and Recall. In particular, the originally confirmed high performance was on a level of 91% when it comes to Precision for low Recall values. Moreover, original performance from late 2013 was retained high (around 70%) even close to 50% of Recall, after which it decreased due to drop in Precision for higher thresholds. However, when the evaluation experiment was repeated in early 2015, the overall results greatly dropped. Precision did not exceed 60% for the whole threshold span. This made us introduce a number of further improvements to regain the lost performance and assure sustaining it on a sufficient level.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Web Mining-Based Method for Cyberbullying Detection

Abstract

Introduction

Complete Chapter List