Defending Deep Learning Models Against Adversarial Attacks

Defending Deep Learning Models Against Adversarial Attacks

Nag Mani, Melody Moh, Teng-Sheng Moh
DOI: 10.4018/IJSSCI.2021010105
Article PDF Download
Open access articles are freely available for download

Abstract

Deep learning (DL) has been used globally in almost every sector of technology and society. Despite its huge success, DL models and applications have been susceptible to adversarial attacks, impacting the accuracy and integrity of these models. Many state-of-the-art models are vulnerable to attacks by well-crafted adversarial examples, which are perturbed versions of clean data with a small amount of noise added, imperceptible to the human eyes, and can quite easily fool the targeted model. This paper introduces six most effective gradient-based adversarial attacks on the ResNet image recognition model, and demonstrates the limitations of traditional adversarial retraining technique. The authors then present a novel ensemble defense strategy based on adversarial retraining technique. The proposed method is capable of withstanding the six adversarial attacks on cifar10 dataset with accuracy greater than 89.31% and as high as 96.24%. The authors believe the design methodologies and experiments demonstrated are widely applicable to other domains of machine learning, DL, and computation intelligence securities.
Article Preview
Top

1. Introduction

Global research in academia and industry has promoted the adoption of deep learning applications in every aspect of life, from smart home devices like Amazon echo, Google Home, and Facebook portal, to industrial applications like deliveries by drone, warehouse automation, medical imaging, and self-driving vehicles. The inception of these devices in both personal and industrial settings has been accelerated by the advancements in the field of deep learning. Transforming perception into smart responses/actions in real time is possible due to faster and more accurate image recognition models. For instance, smartphones use face detection and recognition to authenticate the correct user. Tesla uses deep learning to design self-driving features such as object detection, semantic segmentation, lane detection, pedestrian detection, traffic sign recognition, etc., to make smart decisions in real-time situations. Smart surveillance security cameras are equipped with face and activity recognitions that identify and record any abnormal activity or entry.

However, with the wide-scale adoption of Internet of Things (IoT) devices, the systems are exposed to a multitude of vulnerabilities. One such vulnerability is adversarial examples. These adversarial examples are carefully crafted inputs, aimed at fooling the model and bringing down the model’s accuracy and real-world performance. These attacks are not easy to detect, as they are usually imperceptible to humans, as shown in Figure 1, yet they can easily degrade the model’s accuracy. The adversaries are asymmetric in nature and are created in specific ways to compromise the integrity of deep learning models. These have posed major risks in implementing deep learning in safety-critical applications, such as home security, medical imaging, and autonomous vehicles (Mani & Moh, 2019).

Figure 1.

Clean (left) vs. Adversarial (right) images generated using CWL2 attack

IJSSCI.2021010105.f01

This paper uses multiple gradient-based attacks to showcase their effectiveness against the target model. Defense against these attacks has been a well-researched topic. Previous work on the topic of adversarial retraining showed its effectiveness against different gradient based attacks (Kurakin et al., 2016). The previously proposed approach did add some robustness to the model, yet, the decrease in accuracy was more than 25% in the case of adversarial attacks. This is a significant number when considering the application of these models in safety-critical environments. This paper has explored the same idea of adversarial retraining to build more resilient models that can withstand adversarial attacks with high confidence. Early results have been presented (Mani et al., 2019). The main contribution of this paper is as follows:

  • Provided experimental results of six different gradient-based adversarial attacks, including FGSM (Szegedy et al., 2014), BIM (Gu & Rigazio, 2014), ILLC (Gu & Rigazio, 2014), DeepFool (Moosavi-Dezfooli et al, 2016), Carlini-Wagner L2 and L∞ (Carlini & Wagner, 2017), on ResNet34 model using cifar10 dataset;

  • Demonstrated that the previously adversarial retraining technique proposed (Goodfellow et al., 2015) has limited effectiveness while failed to provide transferable security against more sophisticated attacks;

  • Showed that the proposed adversarial retraining technique coupled with the ensemble method is capable of withstanding even sophisticated attacks like Carlini-Wagner attacks (Carlini & Wagner, 2017), achieving model accuracy with minimum of 89.31%, and up to 96.24% for DeepFool attack.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing