Anomaly-Based Intrusion Detection Using Machine Learning: An Ensemble Approach

Anomaly-Based Intrusion Detection Using Machine Learning: An Ensemble Approach

R. Lalduhsaka, Nilutpol Bora, Ajoy Kumar Khan
Copyright: © 2022 |Pages: 15
DOI: 10.4018/IJISP.311466
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Intrusion detection systems were developed to detect any suspicious traffic in the network. Conventional intrusion detection comes with its sets of limitations. The authors aimed to improve anomaly-based intrusion detection using an ensemble approach of machine learning. In this article, CICIDS2017 and CICIDS 2018 datasets have been used for implementing the proposed method. Random forest regressor is used for feature selection. Three machine learning algorithms (i.e., naïve bayes, QDA, and ID3) are selected and combined (ensembled) for their low computational cost. The ensemble algorithm results are compared with the standalone algorithms. With the ensembled method, classification accuracy of 98.3% and 95.1%, with FAR of 2% and 6.9% were achieved on CICIDS 2017 and CICIDS 2018 datasets respectively. Naïve bayes, QDA, and ID3 have classification accuracies of 82%, 84.7%, and 95.8% respectively on CICIDS 2017; 68.3%, 68.4%, and 94.4% respectively on CICIDS 2018; false alarm rates of 54.9%, 55.5%, and 20.6% respectively on CICIDS 2017; and 3.6%, 3.7%, and 7.1% respectively on CICIDS 2018.
Article Preview
Top

Introduction

Intrusion detection (ID) systems are renowned solutions for detecting malicious activities in a network. These ID systems have become an essential component of defense to network security infrastructure. The importance of these ID systems grows with the exponential growth of network attacks in modern networking systems. In 1931, John Anderson published the first significant paper on ID, Computer Security surveillance, and threat monitoring emphasizing the importance of such systems in security (Xu, Shen, Du & Zhang, 2018). An ID system usually monitors all internal and external packets of a network to detect whether a packet has a sign of violations (Modi et al., 2013). An ID system must be able to determine various kinds of attacks and send alarms when detecting them on the network.

ID systems are generally categorized into two types, based on the methodologies used; Signature-based method (Maleh et al., 2015) and Anomaly-based method (Zhang & Chen, 2017). Signature-based ID systems uses a pattern matching technique to detect an intrusion in the network. This is done by utilizing a database of known attack signatures that are compared with the network traffic; triggering an alert when a match is found. It has an extremely low false alarm rate (Maleh et al., 2021) and is efficient in detecting known attacks. However, since this class of ID systems are solely based on previous information available, they are ineffective against new attacks that are not available in the signature database. In contrast to signature-based ID systems, there is an Anomaly-based ID system that is capable of triggering unknown attacks (Fayssal, Hariri, & Al-Nashif, 2007). Anomaly-based or behavior-based ID systems analyze the network traffic on the basis of the behavior of the network. It defines the normal behavior of a network, and if any abnormal behavior is detected or the network deviates from the normal behavior of the network, an alert is raised. However, these systems are not very accurate when it comes to ID since the profiling of a network is a complex process and often leads to a high false alarm rate (Haq et al., 2015). Most approaches which are currently used in ID systems cannot properly deal with the complex and dynamic nature of malicious threats. Therefore, various methods of Machine Learning techniques are being sought after, to achieve a better detection rate (DR), false alarm rate, and computation costs (Zamani & Movahedi, 2013). The traditional ID techniques have few limitations in protecting a system; most notably, when systems are facing a high volume of malicious attacks (DoS/DDoS); systems can obtain high values of False Positives and False Negatives (Khan & Kim, 2021). Recently, numerous researchers have used Machine Learning techniques for ID systems to improve ID rates. Several studies have been done to enhance and apply this method to the ID systems (Wagh, Pachghare, & Kolhe, 2013). The Machine Learning models were found to have many issues that slow down the training process; these issues included the size of the dataset and the optimal parameters for the most suitable model. These kinds of problems prompted the researchers to look for the most effective methodology. However simple Machine Learning approaches are limited (Mahesh, 2020), while intrusion methods are expanding and growing complex. The authors make use of multiple Machine Learning classification algorithms to make a better prediction model on the network traffic for intrusions.

Different studies have been conducted focusing on how ID can make use of Machine Learning to detect zero-day attacks (Patidar & Khandelwal, 2018; Smys, 2019), which are the attacks not yet recognized by the ID systems. These attacks are unrecognizable in signature-based ID systems; however, anomaly-based ID systems are known to flag these attacks as well. But such anomaly-based systems were initially difficult to trigger an anomaly effectively (Buczak & Guven, 2015). This is where Machine Learning can help in improving the anomaly-based ID systems, by letting the system figure out what kind of traffic is classified as benign and what triggers an attack alert.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing