From Attack to Defense: Strengthening DNN Text Classification Against Adversarial Examples

From Attack to Defense: Strengthening DNN Text Classification Against Adversarial Examples

Marwan Omar
DOI: 10.4018/979-8-3693-1906-2.ch010
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In recent academic discussions surrounding the textual domain, there has been significant attention directed towards adversarial examples. Despite this focus, the area of detecting such adversarial examples remains notably under-investigated. In this chapter, the authors put forward an innovative approach for the detection of adversarial examples within the realm of natural language processing (NLP). This approach draws inspiration from the local outlier factor (LOF) algorithm. The rigorous empirical evaluation, conducted on pertinent real-world datasets, leverages classifiers based on long short-term memory (LSTM), convolutional neural networks (CNN), and transformer architectures to pinpoint adversarial incursions. The results underscore the superiority of our proposed technique in comparison to recent state-of-the-art methods, namely DISP and FGWS, achieving an impressive F1 detection accuracy rate of up to 94.8%.
Chapter Preview
Top

Problem Statement

In the dynamic landscape of Natural Language Processing (NLP), adversarial attacks present a pressing challenge. Adversarial examples, which are inputs manipulated to elicit incorrect outputs from machine learning models, jeopardize the reliability of NLP applications (Goodfellow et al., 2020; Goodfellow et al., 2015). The disruptive potential of these attacks is not merely theoretical but carries significant implications for real-world systems, particularly as NLP is increasingly deployed in critical decision-making roles.

Complete Chapter List

Search this Book:
Reset