Analysis of Machine Learning Techniques for Anomaly-Based Intrusion Detection

Analysis of Machine Learning Techniques for Anomaly-Based Intrusion Detection

Winfred Yaokumah (University of Ghana, Ghana) and Isaac Wiafe (University of Ghana, Ghana)
Copyright: © 2020 |Pages: 19
DOI: 10.4018/IJDAI.2020010102
OnDemand PDF Download:
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Determining the machine learning (ML) technique that performs best on new datasets is an important factor in the design of effective anomaly-based intrusion detection systems. This study therefore evaluated four machine learning algorithms (naive Bayes, k-nearest neighbors, decision tree, and random forest) on UNSW-NB 15 dataset for intrusion detection. The experiment results showed that random forest and decision tree classifiers are effective for detecting intrusion. Random forest had the highest weighted average accuracy of 89.66% and a mean absolute error (MAE) value of 0.0252 whereas decision tree recorded 89.20% and 0.0242, respectively. Naive Bayes classifier had the worst results on the dataset with 56.43% accuracy and a MAE of 0.0867. However, contrary to existing knowledge, naïve Bayes was observed to be potent in classifying backdoor attacks. Observably, naïve Bayes performed relatively well in classes where tree-based classifiers demonstrated abysmal performance.
Article Preview
Top

Introduction

Machine Learning (ML) provides computers the ability to learn without being explicitly programmed. It learns from existing data to make intelligent decisions on present and future task. Several ML techniques have been developed for various applications including performance evaluation of graphical hardware (Girard, Legault, Bois, & Boland, 2019), pharmacology and medical imaging (Toraman, Girgin, Üstündağ, & Türkoğlu, 2019), big data analytics (Rathore, Ahmad, & Paul, 2016) and network pattern detection for anomaly-based intrusion systems (Kirubavathi & Anitha, 2018). One domain that has benefited immensely from ML is intrusion detection and prevention systems. They have been demonstrated to be effective for identifying anomalies in network traffics (Demir & Dalkiliç, 2018), a phenomenon that is timely considering current threats in computer networks.

Cyber security research is continuously expanding and becoming more inter-disciplinary because there is a continuous growth in cyberspace uses. Accordingly, the need to ensure integrity, confidentiality and availability of information cannot be over-emphasized. Current firewalls are not potent enough for detecting highly sophisticated malicious network packets. Therefore, sophisticated gargets such as intrusion detection systems (IDS) provide better security alternatives. They employ improved algorithms that are capable of detecting complex intruders. Some researchers have argued that current ML based intrusion detection approaches mainly focus on feature selection issues because irrelevant features degrade detection accuracy (Mukherjee & Sharma, 2012). Yet, the challenges associated with intrusion complexities and near-normal intruder behavior need attention. Accordingly, research that seek to improve intrusion detection is gradually advancing. As mentioned earlier, ML algorithms now play a key role in the domain of intrusion detection. However, studies that evaluate ML methods with current datasets for intrusion detection system are lacking. Hence, some researchers have called for extensive evaluations of ML performance in the domain (Masarat, Sharifian, & Taheri, 2016).

This study therefore seeks to contribute to existing knowledge on performances of ML methods for anomaly-based IDS. Specifically, four ML algorithms were evaluated using the following performance metrics: True Positive Rate, False Positive Rate, Precision, Recall, F-Measure, and Mean absolute error (MAE). The UNSW-NB 15 dataset (Moustafa & Slay, 2015b) was used to access the potency of Naive Bayes, k-Nearest Neighbors, Decision Tree, and Random Forest using the various performance metrics. The paper is organized as follows: the next section presents related literature and a brief discussion on the algorithms used in this study. Section three (3) presents the experimental setup whereas Section four (4) presents the findings. Section five (5) presents a discussion on the findings and related implications before conclusions are drawn in Section six (6).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 2 Issues (2021): Forthcoming, Available for Pre-Order
Volume 12: 2 Issues (2020)
Volume 11: 2 Issues (2019)
Volume 10: 2 Issues (2018)
View Complete Journal Contents Listing