Handling Minority Class Problem in Threats Detection Based on Heterogeneous Ensemble Learning Approach

Handling Minority Class Problem in Threats Detection Based on Heterogeneous Ensemble Learning Approach

Hope Eke, Andrei Petrovski, Hatem Ahriz
DOI: 10.4018/IJSSSP.2020070102
(Individual Articles)
No Current Special Offers


Multiclass problems, such as detecting multi-steps behaviour of advanced persistent threats (APTs), have been a major global challenge due to their capability to navigates around defenses and to evade detection for a prolonged period. Targeted APT attacks present an increasing concern for both cyber security and business continuity. Detecting the rare attack is a classification problem with data imbalance. This paper explores the applications of data resampling techniques together with heterogeneous ensemble approach for dealing with data imbalance caused by unevenly distributed data elements among classes with the focus on capturing the rare attack. It has been shown that the suggested algorithms provide not only detection capability but can also classify malicious data traffic corresponding to rare APT attacks.
Article Preview

1. Introduction

The ability of an intrusion detection system to detect every possibility of an active attack on a system is a global security challenge. There have been a number of successful breaches of critical infrastructure. Stuxnet is one example of a sophisticated APT attack purposefully launched to target critical nuclear infrastructure in Iran as highlighted in (McAfee Labs, 2011) and (Chen et al., 2011).

There are diverse views as to what makes a threat an APT. Some believe that an APT is nation-state sponsored attack (Ahmad, Webb, Desouza, & Boorman, 2019), as a term that is frequently been used in security threat discussions (Smiraus & Jasek 2011), while (Five, 2011) and (ISACA, 2014) retain their definition, “APT is often aimed at the theft of intellectual property or espionage as opposed to achieving immediate financial gain and are prolonged, stealthy attacks”. However, (Cressey, 2012), (Micro, 2013) and (Chen et al., 2018) view APT as a highly sophisticated combination of different techniques to achieve a specifically targeted and highly valuable goal.

This type of attack has drawn special attention to the possibilities of APT attacks on the Industrial Control System (ICS) such as Supervisory Control and Data Acquisition (SCADA) network. It has also led to research in developing methods to detect intrusions within a network and isolated devices at any level. Due to the dynamic and diverse nature of techniques used by attackers to implement an APT attack, these yielded to uneven distribution different classes. Hence, learning from imbalanced data has notable challenges for machine learning algorithms, since they need to deal with uneven distribution among examples of different classes in the training set (Krawczyk, 2016a) and (Zhou & Liu 2005). Handling imbalanced data distribution in multi classification problem based on ensemble supervised learning and problem decomposition with cost-sensitive learning are still an active research area in machine learning community as demonstrated by the authors (Nguyen et al., 2019), (Nguyen et al., 2018) and (Krawczyk, 2016b).

However, most of these proposed works has led to a significant pool of solutions geared towards addressing both binary and multiclass imbalance problem (Weiss, 2004). Majority of this solutions where mainly for binary imbalanced problem (Krawczyk, 2016a); hence, there is every need for research direction towards developing reliable solutions to deal with multiclass scenario problem. This paper focuses on the implementation of diverse data resampling techniques in combination with heterogeneous ensemble learning approach for handling multiclass imbalanced datasets with special interest on capturing the minority class.

The contribution of this paper can be summarised as follows:

  • Implementation of several oversampling and undersampling approaches for handling binary and multiclass imbalance datasets with main focus on minority class in multiclass label;

  • Analysing the impact of these approaches that could be used to improve the results obtained, without proposing a new algorithm or technique for handling imbalance data;

  • Implementation of oversampling techniques for the multi-class imbalanced classification on two datasets (KDDCup991 and UNSW-NB152), with close attention on the impact and knowledge of the minority class and imbalance distribution factors;

  • Carried out series of experiments to: evaluate the impact of resampling imbalance data and ensemble deep neural networks to (i) accurately detect and classify an attack as abnormal and (ii) classify multiclass label into different type of attacks family.

Complete Article List

Search this Journal:
Volume 14: 1 Issue (2023)
Volume 13: 2 Issues (2022): 1 Released, 1 Forthcoming
Volume 12: 2 Issues (2021)
Volume 11: 2 Issues (2020)
Volume 10: 2 Issues (2019)
Volume 9: 4 Issues (2018)
View Complete Journal Contents Listing