Article Preview
TopIntroduction
Intrusion detection (ID) systems are renowned solutions for detecting malicious activities in a network. These ID systems have become an essential component of defense to network security infrastructure. The importance of these ID systems grows with the exponential growth of network attacks in modern networking systems. In 1931, John Anderson published the first significant paper on ID, Computer Security surveillance, and threat monitoring emphasizing the importance of such systems in security (Xu, Shen, Du & Zhang, 2018). An ID system usually monitors all internal and external packets of a network to detect whether a packet has a sign of violations (Modi et al., 2013). An ID system must be able to determine various kinds of attacks and send alarms when detecting them on the network.
ID systems are generally categorized into two types, based on the methodologies used; Signature-based method (Maleh et al., 2015) and Anomaly-based method (Zhang & Chen, 2017). Signature-based ID systems uses a pattern matching technique to detect an intrusion in the network. This is done by utilizing a database of known attack signatures that are compared with the network traffic; triggering an alert when a match is found. It has an extremely low false alarm rate (Maleh et al., 2021) and is efficient in detecting known attacks. However, since this class of ID systems are solely based on previous information available, they are ineffective against new attacks that are not available in the signature database. In contrast to signature-based ID systems, there is an Anomaly-based ID system that is capable of triggering unknown attacks (Fayssal, Hariri, & Al-Nashif, 2007). Anomaly-based or behavior-based ID systems analyze the network traffic on the basis of the behavior of the network. It defines the normal behavior of a network, and if any abnormal behavior is detected or the network deviates from the normal behavior of the network, an alert is raised. However, these systems are not very accurate when it comes to ID since the profiling of a network is a complex process and often leads to a high false alarm rate (Haq et al., 2015). Most approaches which are currently used in ID systems cannot properly deal with the complex and dynamic nature of malicious threats. Therefore, various methods of Machine Learning techniques are being sought after, to achieve a better detection rate (DR), false alarm rate, and computation costs (Zamani & Movahedi, 2013). The traditional ID techniques have few limitations in protecting a system; most notably, when systems are facing a high volume of malicious attacks (DoS/DDoS); systems can obtain high values of False Positives and False Negatives (Khan & Kim, 2021). Recently, numerous researchers have used Machine Learning techniques for ID systems to improve ID rates. Several studies have been done to enhance and apply this method to the ID systems (Wagh, Pachghare, & Kolhe, 2013). The Machine Learning models were found to have many issues that slow down the training process; these issues included the size of the dataset and the optimal parameters for the most suitable model. These kinds of problems prompted the researchers to look for the most effective methodology. However simple Machine Learning approaches are limited (Mahesh, 2020), while intrusion methods are expanding and growing complex. The authors make use of multiple Machine Learning classification algorithms to make a better prediction model on the network traffic for intrusions.
Different studies have been conducted focusing on how ID can make use of Machine Learning to detect zero-day attacks (Patidar & Khandelwal, 2018; Smys, 2019), which are the attacks not yet recognized by the ID systems. These attacks are unrecognizable in signature-based ID systems; however, anomaly-based ID systems are known to flag these attacks as well. But such anomaly-based systems were initially difficult to trigger an anomaly effectively (Buczak & Guven, 2015). This is where Machine Learning can help in improving the anomaly-based ID systems, by letting the system figure out what kind of traffic is classified as benign and what triggers an attack alert.