Binary Classification of Network-Generated Flow Data Using a Machine Learning Algorithm

Binary Classification of Network-Generated Flow Data Using a Machine Learning Algorithm

Sikha Bagui, Keenal M. Shah, Yizhi Hu, Subhash Bagui
Copyright: © 2021 |Pages: 18
DOI: 10.4018/IJISP.2021010102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This study proposes a model for building intrusion detection systems. The dataset used, CICIDS 2017, contains 14 different attacks with 85 features for each attack. This high dimensionality of the data is a major challenge when building efficient intrusion detection systems, especially in today's big data environment, since a lot of the features are redundant. The main goal in this paper was to reduce the number of features and present a detailed discussion of the important features. For feature selection, information gain was used in an iterative way, and for classification, a machine learning algorithm, the J48 decision tree algorithm, was used. The important features for the classification of each attack were identified, and the features that were important for classifying multiple attacks were also identified and discussed.
Article Preview
Top

There is a lot of research on the classification of cyberattacks and the evaluation of machine learning algorithms on different datasets, but in this related works section we will mainly focus on works that center around the CICIDS 2017 and similar datasets.

Yavangolu and Aydos (2017) present a comprehensive review of Cyber Security datasets for machine learning algorithms. Yusof et al. (2017) compares their proposed feature selection technique with traditional techniques. Using a combination of two feature selection techniques: Consistency-based Subset Evaluation and DDoS characteristic-based features, they were able to identify and select the most significant features for the NSL-KDD 2009 dataset. Almseidin et al. (2017) performed several experiments to evaluate the efficiency and the performance of several machine learning algorithms on the KDD dataset. They found that there is no single machine learning algorithm that can handle all attack types effectively. Shafiq et al. (2016) discuss network traffic classification techniques. They capture real time internet data using a network traffic capture tool. This tool is used to extract features from the capture traffic and then four machine learning classifiers Support Vector Machine, C4.5 decision tree, Naive Bayes and Bayes Net classifiers are applied. They found that the C4.5 classifiers gave the best accuracy results as compared to other classifiers. Zhou et al. (2019)’s work realizes the importance of data pre-processing in Cyber Security data and proposes a new methodology to compare the benefits of Correlation Based Feature Selection (CFS) and the Bat Algorithm with an ensemble based on C4.5, Random Forest (RF), and Forest by Penalizing Attributes (Forest PA). Zhou et al. (2019)’s analysis also included the CICIDS 2017 dataset.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing