Article Preview
Top1. Introduction
The ability of an intrusion detection system to detect every possibility of an active attack on a system is a global security challenge. There have been a number of successful breaches of critical infrastructure. Stuxnet is one example of a sophisticated APT attack purposefully launched to target critical nuclear infrastructure in Iran as highlighted in (McAfee Labs, 2011) and (Chen et al., 2011).
There are diverse views as to what makes a threat an APT. Some believe that an APT is nation-state sponsored attack (Ahmad, Webb, Desouza, & Boorman, 2019), as a term that is frequently been used in security threat discussions (Smiraus & Jasek 2011), while (Five, 2011) and (ISACA, 2014) retain their definition, “APT is often aimed at the theft of intellectual property or espionage as opposed to achieving immediate financial gain and are prolonged, stealthy attacks”. However, (Cressey, 2012), (Micro, 2013) and (Chen et al., 2018) view APT as a highly sophisticated combination of different techniques to achieve a specifically targeted and highly valuable goal.
This type of attack has drawn special attention to the possibilities of APT attacks on the Industrial Control System (ICS) such as Supervisory Control and Data Acquisition (SCADA) network. It has also led to research in developing methods to detect intrusions within a network and isolated devices at any level. Due to the dynamic and diverse nature of techniques used by attackers to implement an APT attack, these yielded to uneven distribution different classes. Hence, learning from imbalanced data has notable challenges for machine learning algorithms, since they need to deal with uneven distribution among examples of different classes in the training set (Krawczyk, 2016a) and (Zhou & Liu 2005). Handling imbalanced data distribution in multi classification problem based on ensemble supervised learning and problem decomposition with cost-sensitive learning are still an active research area in machine learning community as demonstrated by the authors (Nguyen et al., 2019), (Nguyen et al., 2018) and (Krawczyk, 2016b).
However, most of these proposed works has led to a significant pool of solutions geared towards addressing both binary and multiclass imbalance problem (Weiss, 2004). Majority of this solutions where mainly for binary imbalanced problem (Krawczyk, 2016a); hence, there is every need for research direction towards developing reliable solutions to deal with multiclass scenario problem. This paper focuses on the implementation of diverse data resampling techniques in combination with heterogeneous ensemble learning approach for handling multiclass imbalanced datasets with special interest on capturing the minority class.
The contribution of this paper can be summarised as follows:
- •
Implementation of several oversampling and undersampling approaches for handling binary and multiclass imbalance datasets with main focus on minority class in multiclass label;
- •
Analysing the impact of these approaches that could be used to improve the results obtained, without proposing a new algorithm or technique for handling imbalance data;
- •
Implementation of oversampling techniques for the multi-class imbalanced classification on two datasets (KDDCup991 and UNSW-NB152), with close attention on the impact and knowledge of the minority class and imbalance distribution factors;
- •
Carried out series of experiments to: evaluate the impact of resampling imbalance data and ensemble deep neural networks to (i) accurately detect and classify an attack as abnormal and (ii) classify multiclass label into different type of attacks family.