Concept Drift Adaptation in Intrusion Detection Systems Using Ensemble Learning

Concept Drift Adaptation in Intrusion Detection Systems Using Ensemble Learning

Deepa C. Mulimani, Shashikumar G. Totad, Prakashgoud R. Patil
Copyright: © 2021 |Pages: 22
DOI: 10.4018/IJNCR.2021100101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The primary challenge of intrusion detection systems (IDS) is to rapidly identify new attacks, learn from the adversary, and update the intrusion detection immediately. IDS operate in dynamic environments subjected to evolving data streams where data may come from different distributions. This is known as the problem of concept drift. Today's IDS though are equipped with deep learning algorithms most of the times fail to identify concept drift. This paper presents a technique to detect and adapt to concept drifts in streaming data with a large number of features often seen in IDS. The study modifies extreme gradient boosting (XGB) algorithm for adaptability of drifts and optimization for large datasets in IDS. The primary objective is to reduce the number of ‘false positives' and ‘false negatives' in the predictions. The method is tested on streaming data of smaller and larger sizes and compared against non-adaptive XGBoost and logistic regression.
Article Preview
Top

Introduction

Today's modernized networked business environments call for a high level of security wherein the communication of information among the numerous organizations happens safely and in a trusted manner. To provide an adaptable system security to these businesses Intrusion Detection Systems (IDS) act as a safeguard technology. As the business world is ever evolving the cyber-attacks will only become more sophisticated and hence the primary goal of such technologies should be to adapt along with the evolving threats (Anazida Zainal et.al., 2009). An IDS device or software application can monitor a network for malicious activity or policy violations. Any malicious activity, anomalies or violation in streaming data is typically reported or collected centrally using a security information and event management system (Shashikumar G Totad, et.al., 2020). IDS detection types range from antivirus software to tiered monitoring systems that follow the traffic of an entire network. They can analyze incoming network traffic and monitor important operating system files. Signature-based IDS detects possible threats by looking for specific patterns, such as byte sequences in network traffic, or known malicious instruction sequences used by malware. This terminology originates from antivirus software, which refers to these detected patterns as signatures. Although signature-based IDS can easily detect known attacks, it is impossible to detect new attacks, for which no pattern is available. These new patterns come from evolving data distributions which the IDS haven’t learnt earlier.

The present IDS are designed using techniques of machine learning towards being smart, robust, and defensive. Though there are numerous techniques developed, IDS require incremental learning as they may fail to address the dynamic nature of the data. Incremental learning (Haibo He, et.al., 2011) provides two approaches – single classifier and ensemble classifier. The ensemble learning approaches are found to be highly accurate, efficient, and robust (Rajadurai H, Gandhi U. A, 2020), (Abdulla Amin Aburomman, Mamun Bin Ibne Reaz, 2017). In literature, there are only a few pieces of research on ensemble learning that deal with varying data distributions in IDS (Amin Rasoulifard, et.al., 2008) (Gang Yin et.al., 2014). In dynamic environments, the Concept Drift problem is the phenomenon of data distribution change that affects the learning algorithm’s performance. The adversaries may introduce new classes of intrusions. On the other hand, the varying data might introduce new genuine classes. In any case, the IDS must train itself on any such concept drifts and robustly adapt to the changes. The algorithms/models implemented in any IDS must be tuned according to changing environments with proper placement and maintenance. Also, they are required to perform accurately irrespective of dataset size. By this a high-level security can be achieved while maintaining an efficient network performance by avoiding unwanted network traffic.

This paper introduces an implementation of ensemble technique – Adaptive eXtreme Gradient Boosting (Jacob Montiel, et.al., 2020) to integrate concept drift problems in IDS that are subjected to streaming data. The proposed IDS classifier uses ensemble learning in two ways – Push Strategy and Replacement Strategy for learning the statistical characteristics of data. It then tries to identify new classes for being intrusions or genuine. The technique is optimized to work on smaller datasets and larger datasets. The testing is done using two datasets – (i) KDD CUP 99 data set (smaller size) and (ii) CSE-CIC-IDS2018 dataset (larger size). Both datasets which are in .csv format are transformed into streaming data and fed into the classifier. The method proves to perform accurately on any size of streaming data. The paper is structured into the main parts – Related Work, Concept Drift and Adaptive eXtreme Gradient Boosting algorithm, Proposed Methodology and Experiments, Results, and Conclusion.

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 11: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2021)
Volume 9: 4 Issues (2020)
Volume 8: 4 Issues (2019)
Volume 7: 4 Issues (2018)
Volume 6: 2 Issues (2017)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing