Article Preview
TopIntroduction
Computer networks are evolving significantly as a result of rapid developments in Internet of Things (IoT), wireless handled devices, networks of moving vehicles, 4G and 5G, and cyber-physical systems (Kasongo & Sun, 2020). These technologies enable the exchange of huge amount of data, thereby making them susceptible to various malicious actions, security threats, and cyber-attacks (Kasongo & Sun, 2020). Cyber-attacks usually exploit vulnerabilities in the current network ecosystem. These attacks target sensitive information and can interrupt the availability of the system. Notable among these attacks are routing attacks, Denial of Service (DoS) attack, flooding attacks, data leakage, Distributed Denial of Service (DDoS), spoofing, wormhole attacks, and insecure gateways (Santos et al., 2019; Wazid et al., 2019; Deshmukh-Bhosale & Sonavane, 2019). These attacks and vulnerabilities enable theft of information on computer network systems (Hajisalem & Babaie, 2018), disruption of business operations, and breach of confidentiality, integrity and availability of the information system resources, thereby leading to great financial loss (Faker & Dogdu, 2019). For example, in 2017 the cost on the global economy was $600 billion due to cyber-attacks or systems vulnerabilities and that cost rose to $1 trillion in 2018 (www.technology.Org, 2019).
As network attacks become sophisticated, intelligent network intrusion detection systems are developed to detect and prevent these attacks. Intelligent intrusion detection systems (IDS) can detect unauthorized access (Gendreau & Moorman, 2016), analyze the activities in the network to prevent malicious behavior from disrupting the network (Choudhary & Kesswani, 2018). According to Choudhary and Kesswani (2018), the intelligent devices themselves are vulnerable and exposed to cyber-attacks. Consequently, machine learning (ML) algorithms have proven to be an efficient method for detecting network intrusions. With machine learning algorithms, the essential part is the dataset used. The dataset used should be as definite as possible because little changes to the data in the dataset can cause lots of different outcomes in the detection of the attacks. There are various ML approaches which are used to detect attacks on the networks. Some of the prominent approaches used are deep learning techniques (Al-hawawreh et al., 2019), random neural networks (Saeed et al., 2016), binary logistic regression (Ioannou & Vassiliou, 2018), K-Nearest Neighbor (KNN) (Li et al., 2014), Naïve Bayes classification (Mehmood et al., 2018), and neighbor discovery protocol (Alsadhan et al., 2019).
This study aims at developing machine learning models to improve the detection of network intrusions with a recent network intrusion dataset and evaluating the performance of the developed models. Supervised machine learning techniques are utilized in the implementation of the intrusion detection models. The supervised machine learning techniques employed in this paper are the K-Nearest Neighbor (KNN) Classification algorithm, the Support Vector Machine (SVM) classification algorithm, Voting Ensemble, Random Forest and eXtreme Gradient Boosting (XGBoost) algorithm. Three different values for K are used in the K-Nearest Neighbor model, and two different kernels are used in the Support Vector Machine model. Random Forest and XGBoost are used for binary and multiclass classification. Ten features of the UNSW_NB15 dataset are used in the implementation of the proposed K-Nearest Neighbor (KNN) Algorithm, the Support Vector Machines (SVM), Voting Ensemble, Random Forest Classifier, and XGBoost models.