Abstract
Deep reinforcement learning has shown remarkable results across various tasks. However, recent studies highlight the susceptibility of DRL to targeted adversarial disruptions. Furthermore, discrepancies between simulated settings and real-world applications often make it challenging to transfer these DRL policies, particularly in situations where safety is essential. Several solutions have been proposed to address these issues to enhance DRL's robustness. This chapter delves into the significance of adversarial attack and defense strategies in machine learning, emphasizing the unique challenges in adversarial DRL settings. It also presents an overview of recent advancements, DRL foundations, adversarial Markov decision process models, and comparisons among different attacks and defenses. The chapter further evaluates the effectiveness of various attacks and the efficacy of multiple defense mechanisms using simulation data, specifically focusing on policy success rates and average rewards. Potential limitations and prospects for future research are also explored.
TopIntroduction
DRL has demonstrated effectiveness in numerous applications such as task scheduling (Wang, 2023a; Wang et al., 2022, 2018), refining manufacturing processes (Wang et al., n.d.; Yun et al., 2023), robotic operations (Wang, 2022), and knowledge reasoning (Wang, 2023b, 2023c). Yet, it's susceptible to targeted disruptions during its learning or evaluation stages (Grosse et al., 2017). The disparity between virtual simulations and the realities of practical scenarios complicates the transfer of these learned strategies. This is even more challenging in critical safety domains like autonomous driving and robot operations (He et al., 2022a, 2022b; Wang et al., 2021). Thus, it is important to improve the robustness of the DRL approaches.
Szegedy et al. (Szegedy et al., 2013) proved the existence of “blind point”. Specific input disturbances can mislead the DRL model entirely. Building on this, Papernot et al. (Papernot et al., 2016) designed a framework to understand the potential weak points of a system, comprising both the neural network model and the dataset. Fig. 1 illustrates vulnerable points in machine learning (ML) systems.
Figure 1. Vulnerable points in ML systems
Table 1 lists the common attacking strategies including FGSM, C&W, and PGD. Table 2 lists the common defending strategies including adversarial training, data compression, and gradient masking methods.
Table 1. Adversarial attacking strategies in ML
Methods | Details |
FGSM attack | FGSM (Goodfellow et al., 2014) is an attack driven by gradients. Its core action involves determining the model's derivative concerning the input to introduce disturbances. |
C&W attack | Carlini & Wagner (CW) attack algorithm (Carlini and Wagner, 2017) optimizes high attack accuracy and low adversarial disturbance at the same time. |
PGD | Project Gradient Descent (PGD) (Madry et al., 2017) works by iteratively adjusting a given input to maximize the model's loss function, which represents how wrong the model's predictions are for that input. The “projected” part of PGD means that after each adjustment, the perturbed input is clipped to ensure that it stays within a predefined allowable range. |
Key Terms in this Chapter
Robust Deep Reinforcement Learning: Focuses on designing DRL agents that can perform reliably and maintain their efficacy in the presence of adversarial disturbances or uncertainties in the environment. The aim is to ensure that the agent can handle both known and unforeseen challenges, thereby generalizing well across diverse and potentially adversarial settings.
Markov Decision Process: This MDP ensures that the agent's decisions are based on the current state, aiming to maximize long-term rewards.
Adversarial Training: A technique used to improve the robustness of machine learning models by exposing them to malicious inputs during the training phase. By learning from these intentionally perturbed examples, the model becomes better equipped to handle similar adversarial inputs during testing or real-world applications.