Article Preview
Top1. Introduction
Intrusion detection systems (IDS) addresses problems that are not solved by firewall techniques. IDS is capable of recognizing those attacks which firewalls are not able to prevent. (Pradhan et al., 2016). IDS can detect malicious activities performed by external or internal attackers (Korba et al., 2017). These attacks attempt to disrupt legitimate user’s access to services (Nagesh et al., 2017). There is a growing need for efficient methods to detect outliers or anomalies in network traffic data. Network traffic data is massive and highly dimensional, and it is challenging to extract relevant information to identify attacks. Anomaly based IDS is of great interest in the research community for many years and is based on the assumption that the behavior of intruders is different from that of a legitimate user. Anomaly based IDS can easily attain very high detection rate using a strict definition of normal activities, but at the cost of unacceptably higher false alarm rate. It is in fact a challenging task to improve DR beyond a certain limit while keeping FAR at a reduced level.
Outlier detection is a data mining concept that finds immense applications in varied fields. Outliers are data that are notably different from the rest of the data. Originally, outlier detection was used as a preprocessing step for removing noise and extreme values. But nowadays, outlier detection has become a field of interest for applications involving detection of fraudulent activities as it can be used to isolate suspicious patterns (Settanni & Filzmoser, 2018; Settanni & Filzmoser, 2018; Domingues et al., 2018; Rousseeuw et al., 2019). Outlier detection has been used for centuries to detect and remove anomalous data points (Hodge, 2014). Anomaly-based IDS is an application area where outlier analysis plays a vital role because intrusions are rare events compared to normal events and these rare events can be treated as outliers. “The anomaly detection problem is similar to the problem of finding outliers, specifically in network intrusion detection” (Gogoi, Borah, Bhatacharyya & Kalita, 2011). A hacker inside a network with an evil intent can be pointed out obviously by an outlier (Ganapathy, Jaisankar, Yokesh & Kannan, 2011).
There are many methods employed in the literature for outlier detection like statistical-based, distance-based, density-based, clustering-based and frequent-pattern-based. A detailed survey of such outlier detection methods applied for detecting network intrusions is given in (Beulah & Punithavathani, 2015). All the outlier detection methods create a model for the normal pattern in the data and then outliers are identified by finding the deviations from the learned model.
Clustering can be regarded as a complimentary problem to outlier detection (Aggarwal, 2015). Clustering aims at finding data points having similar properties whereas outlier detection looks for data points that are different from others. In most of the clustering algorithms outliers are obtained as side-products. Clustering is much suitable for the problem of network intrusion detection and is one of the most effective ways to decide whether a connection is legitimate or malicious (Hassani & Seidl, 2011). Clustering algorithms can be carefully designed to detect outliers or anomalies in network traffic data.