Adversarial Attacks and Defenses in Malware Detection Classifiers

Adversarial Attacks and Defenses in Malware Detection Classifiers

Teenu S. John (Indian Institute of Information Technology and Management, India) and Tony Thomas (Indian Institute of Information Technology and Management, India)
DOI: 10.4018/978-1-5225-8407-0.ch007


Machine learning has found its immense application in various cybersecurity domains owing to its automated threat prediction and detection capabilities. Despite its advantages, attackers can utilize the vulnerabilities of machine learning models for degrading its performance. These attacks called adversarial attacks can perturb the features of the data to induce misclassification. Adversarial attacks are highly destructive in the case of malware detection classifiers, causing a harmful virus or trojan to evade the threat detection system. The feature perturbations carried out by an adversary against malware detection classifiers are different from the conventional attack strategies employed by an adversary against computer vision tasks. This chapter discusses various adversarial attacks launched against malware detection classifiers and the existing defensive mechanisms. The authors also discuss the challenges and the research directions that need to be addressed to develop effective defensive mechanisms against these attacks.
Chapter Preview


Machine learning (ML) has been widely used in applications such as pattern recognition, natural language processing (Akter, and, & 2018, n.d.), intrusion detection, facial recognition, and malware detection processes. ML comprises of two phases: - a training phase in which the model is given large data to learn and make predictions and a testing phase in which the model is evaluated with a new dataset that is independent of the training set to assess its performance. A validation set is used to tune the parameters of the ML model for better accuracy before finalizing the model. Generally, machine learning can be categorized as supervised, unsupervised, semi-supervised and reinforcement learning (RL). Supervised learning technique such as SVM trains the classifier with associated labels while in unsupervised learning, like clustering, the classifier works with no knowledge about the labels. In cybersecurity, a supervised learning technique like decision tree, neural networks and SVM can be used in malware detection and spam detection, since these applications acquire a large set of labelled data instances. For intrusion detection, a semi-supervised learning method outperforms the other two for its ability to detect unknown attacks. An unsupervised learning like self-organizing maps is used for effectively detecting anomalies in the network. The reinforcement learning, on the other hand, utilizes the feedback of the environment for learning with the help of agents. Reinforcement learning is found to be effective when there are high false positives, especially in the case of anomaly detection. Here a reinforcement agent helps to adjust the parameters of the detection model by adjusting the weights of the algorithm. An important challenge in cybersecurity is the amount of data that needs to be analyzed by security analysts to detect the attacks. For a cyber-threat management system, a security analyst has to analyze a large number of data such as firewall logs, and user activities (Big Data) to detect all possible attacks (Bhushan & Gupta, 2017) (Bhushan & Gupta, 2018) (Hossain, Muhammad, Abdul, 2018, n.d.). Hence security vendors are equipping themselves with ML techniques to automatically detect malwares and vulnerable executables from a large volume of data. This is why major cybersecurity threat management systems like Sophos's Invincea and Radware's Seculert technology are acquiring ML capabilities to detect and prevent sophisticated cyber-attacks. Among the ML classifiers, deep learning has attracted researchers due to its increased classification accuracy and its ability to learn from unlabelled data (Thomas, John & Uddin, 2017, n.d.). Deep learning has gained its popularity especially with Google's AlphagoAI, an intelligent player modelled to play the game of Go using deep learning. The main advantage of deep learning is its ability to generate new features from limited or existing features which makes it extremely useful in malware detection and biometrics.

Complete Chapter List

Search this Book: