Article Preview
TopIntroduction
Machine Learning (ML) provides computers the ability to learn without being explicitly programmed. It learns from existing data to make intelligent decisions on present and future task. Several ML techniques have been developed for various applications including performance evaluation of graphical hardware (Girard, Legault, Bois, & Boland, 2019), pharmacology and medical imaging (Toraman, Girgin, Üstündağ, & Türkoğlu, 2019), big data analytics (Rathore, Ahmad, & Paul, 2016) and network pattern detection for anomaly-based intrusion systems (Kirubavathi & Anitha, 2018). One domain that has benefited immensely from ML is intrusion detection and prevention systems. They have been demonstrated to be effective for identifying anomalies in network traffics (Demir & Dalkiliç, 2018), a phenomenon that is timely considering current threats in computer networks.
Cyber security research is continuously expanding and becoming more inter-disciplinary because there is a continuous growth in cyberspace uses. Accordingly, the need to ensure integrity, confidentiality and availability of information cannot be over-emphasized. Current firewalls are not potent enough for detecting highly sophisticated malicious network packets. Therefore, sophisticated gargets such as intrusion detection systems (IDS) provide better security alternatives. They employ improved algorithms that are capable of detecting complex intruders. Some researchers have argued that current ML based intrusion detection approaches mainly focus on feature selection issues because irrelevant features degrade detection accuracy (Mukherjee & Sharma, 2012). Yet, the challenges associated with intrusion complexities and near-normal intruder behavior need attention. Accordingly, research that seek to improve intrusion detection is gradually advancing. As mentioned earlier, ML algorithms now play a key role in the domain of intrusion detection. However, studies that evaluate ML methods with current datasets for intrusion detection system are lacking. Hence, some researchers have called for extensive evaluations of ML performance in the domain (Masarat, Sharifian, & Taheri, 2016).
This study therefore seeks to contribute to existing knowledge on performances of ML methods for anomaly-based IDS. Specifically, four ML algorithms were evaluated using the following performance metrics: True Positive Rate, False Positive Rate, Precision, Recall, F-Measure, and Mean absolute error (MAE). The UNSW-NB 15 dataset (Moustafa & Slay, 2015b) was used to access the potency of Naive Bayes, k-Nearest Neighbors, Decision Tree, and Random Forest using the various performance metrics. The paper is organized as follows: the next section presents related literature and a brief discussion on the algorithms used in this study. Section three (3) presents the experimental setup whereas Section four (4) presents the findings. Section five (5) presents a discussion on the findings and related implications before conclusions are drawn in Section six (6).