Crucial Role of Data Analytics in the Prevention and Detection of Cyber Security Attacks

Crucial Role of Data Analytics in the Prevention and Detection of Cyber Security Attacks

Charulatha B. S., A. Neela Madheswari, Shanthi K., Chamundeswari Arumugam
Copyright: © 2021 |Pages: 14
DOI: 10.4018/978-1-7998-4900-1.ch004
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Data analytics plays a major role in retrieving relevant information in addition to avoiding unwanted data, missed values, good visualization and interpretation, decision making in any business, or social needs. Many organizations are affected by cyber-attacks in their business at a greater frequency when they get exposure to the internet. Cyber-attacks are plenty, and tracking them is really difficult work. The entry of cyber-attack may be through different events in the business process. Detecting the attack is laborious and collecting the data is still a hard task. The detection of the source of attack for the various events in the business process as well as the tracking the corresponding data needs an investigation procedure. This chapter concentrates on applying machine learning algorithms to study the user behavior in the process to detect network anomalies. The data from KDD'99 data set is collected and analyzed using decision tree, isolation forest, bagging classifier, and Adaboost classifier algorithms.
Chapter Preview
Top

Introduction

Data Analysis can be applied in many disciplines of data mining, cyber security to study the behavior of data for anomaly detection. In anomaly detection a profile of the normal data is developed and then data that does not agree with it as an anomaly (Dimitar et al., 2017). Broadly speaking, dataset used in various analyses can be split into two subdomains, inliers and outliers, in which the outlier’s data behaves abnormally (Fabrizio et al., 2016). Using the outlier’s data, it is possible to detect the anomalous behavior at a certain point of time in an application. Outlier detection can belong to three families, namely supervised, semi-supervised, and unsupervised. Mostly the outlier dataset is analyzed using the unsupervised methods based on statistical, deviation, distance based, density based, angle based, isolation based, concept based, cluster size/density based, etc.

The application where the data behave anomalously can be fields of Data mining (Zhangyu Cheng, 2019), sensor technology (Yu HsuanKuo et al., 2018), Cyber-attack (Simon D.Duque Anton et al., 2019) (Filipe Falcão et al., 2019), HTTP/HTTPS protocol (Hieu Mac et al., 2018) (Ya-Lin Zhang et al., 2018), weather data (Tadesse Zemicheal et al., 2019) Network Intrusion Detection System (Zouhair Chiba et al., 2019) etc. Normally, outlier detection methods can detect the outliers accurately. In many cases, they end up with non-outliers as outliers and outliers as non-outliers. So a biased method will do this task well efficiently to detect the anomalous behavior of the data.

The main idea behind the work is to do the network traffic analysis. Using this analysis, the network administrator can have a watch on the traffic pattern to identify anomalous traffic. In order to have familiarity with the analysis, historical data is used to develop mathematical model. For the development of the model the data set available from Kaggle is used which is freely available to the public.

The objective of this paper is to detect network based anomalies using the publicly available dataset. All the parameters are cautiously considered and class required for this analysis is tracked. The training and testing data splitting was conveniently decided in applying the decision tree, random forest and bagging classifier algorithms. Python is used in this research work to detect the network anomalies and accuracy was also evaluated. The contribution that was emphasized in this chapter is as follows:

  • To detect network-based anomaly detection systems

  • Apply the decision tree, isolation forest, random forest, Ada Boost and bagging classifier on data set

  • Study the performance accuracy of these methods in detecting the anomalies

The organization of the paper is as follows. Background Section details the literature survey and the Intrusion Detection Section discusses the anomaly detection used in this paper. The methods used for analysis are decision trees, random forest, isolation forest, bagging classifier, and Ada Boost classifier are discussed. The Methodology section details the applicability of the unsupervised anomaly detection algorithm used in this work, and the results obtained by applying these methods using the dataset KDD’99. Conclusion section provides the summary of the work done and future work of this proposed chapter.

Top

Background

(Zhangyu Cheng et al, 2019) applied anomaly detection methods to detect the local and global outliers using Isolation forest and local outlier factor with low complexity to prune the data set. Also applied ensemble method to improve pruning accuracy and improve the outlier detection rate. (Yu-HsuanKuo et al, 2018) proposed a regression model to fit the sensor data to detect the outliers using contextual outlier detection methods. (Filipe Falcão et al, 2019) used 12 types of detection methods that belong to a family of algorithms for the dataset that is prone to system and network intrusion detection. (Hieu et al, 2018) targeting the web attack of SQL Injection, Cross-site Scripting(XSS), XPath Injection, Local File Inclusion(LFI), Server-side Template Injection, Code Injection, OS command Injection, Server side Request Forgery, and Others. They analyzed and detected malicious patterns in the HTTP/HTTPS requests using regularized deep autoencoders. (Ya-Lin et al, 2018) proposed the Anomaly Detection with partial Observed Anomalies using the three methods, isolation forest unsupervised method, support vector machine supervised method, and the cost sensitive strategy PU learning based method on the different datasets. Also the problem of malicious URL detection was also demonstrated.

Complete Chapter List

Search this Book:
Reset