Article Preview
Top1. Introduction
The contribution of artificial intelligence to cyber-security is paramount, given that it has the potential to increase the security level of the defended distributed system (Feltus et al., 2007) up to the state-of-the-art level generally reached by the attackers. In the field of machine learning, the approaches by which the computer program learns to generate output from experiments are classified into three paradigms: supervised, unsupervised and reinforcement learning (RL). In supervised learning, the model is trained using the input data labels, in unsupervised learning, the model is trained using patterns discovered in the input data, and in RL, a software agent learns to react on its own to an environment that it does not yet know (Van Otterlo & Wiering, 2012).
Reinforcement learning involves agents, states (S), and actions per state (A). Agents evolves from state to state when they perform actions. In order to learn how to react, agents make decisions and take action at time t At – (Fig.1) with the objective of accumulating rewards (Rt) while avoiding errors. As RL algorithms mostly use dynamic programming techniques, this reward-based environment is typically represented in the of Markov decision processes. These processes reflect a straightforward description of the problem in order to learn to reach a desired goal. In practice, agents continually select actions while the form environment in which they behave responds and presents new situations (Fig. 1)
In contrast to classical dynamic programming methods, RL algorithms have no knowledge of the exact Markov decision processes. Q-Learning [38] is an RL algorithm, whose purpose is to learn the policy that informs agents of the action they have to achieve in determined situations. This policy is optimized and gives all the successive steps necessary to achieve a goal while maximizing the gain of the rewards. Agents that learn the environment must continuously choose between exploiting the knowledge learned and exploring new potential actions to perform. Hence, an important parameter to be considered while defining RL algorithms is the e-greedy, which represents the proportion of exploration vs. exploitation actions (e.g., Li et al., 2018).
Reinforcement learning has already proven to be worthwhile for many fields, such as operations research, multi-agent systems, genetic algorithm or game theory. For some years, it has also been regarded as a strong potential contributor to the security and cyber-security domains (Feltus et al., 2009). However, although reviews of the contributions of machine learning and deep learning to computer security have already been undertaken for very specific fields, like biometry (e.g., Sundararajan & Woodard, 2018), to our knowledge, no systematic deep analysis of the contributions of reinforcement learning to the different fields of cyber-security has ever been completed. This is the aim of this paper. Elaborated from the strategic literature review method (Petersen et al., 2015), the paper will successively answer three knowledge questions:
Answering this question will allow us to identify the fields of cyber-security that most benefit RL-based contributions, as well as the type of contribution and the volume of research dedicated to it. These fields are: malware/intrusion detection, attacker/defender game, security policy elaboration, biometric authentication and software/system protection.
This question will allow us to determine which industrial areas are the most impacted by the RL-based security contributions, for what purpose and to what amount.