K-NN Based Outlier Detection Technique on Intrusion Dataset

K-NN Based Outlier Detection Technique on Intrusion Dataset

Santosh Kumar Sahu, Sanjay Kumar Jena, Manish Verma
Copyright: © 2017 |Pages: 13
DOI: 10.4018/IJKDB.2017010105
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Outliers in the database are the objects that deviate from the rest of the dataset by some measure. The Nearest Neighbor Outlier Factor is considering to measure the degree of outlier-ness of the object in the dataset. Unlike the other methods like Local Outlier Factor, this approach shows the interest of a point from both neighbors and reverse neighbors, and after that, an object comes into consideration. We have observed that in GBBK algorithm that based on K-NN, used quick sort to find k nearest neighbors that take O (N log N) time. However, in proposed method, the time required for searching on K times which complete in O (KN) time to find k nearest neighbors (k < < log N). As a result, the proposed method improves the time complexity. The NSL-KDD and Fisher iris dataset is used, and experimental results compared with the GBBK method. The result is same in both the methods, but the proposed method takes less time for computation.
Article Preview
Top

Introduction

Data mining is a process of extracting knowledge or valid data from the dataset. There are many difficulties in the database such as redundant data, missing data, not the specific value of attribute and outliers.

Outliers are values of a variable which statistical properties not matched with the other values (Panda, & Jana, 2015; Panda, Neha, & Sathua, 2015, Panda, & Jana, 2016). They can severely affect the result of predictive analysis (Gogoi, Bhattacharyya, Borah et al., 2014). Depending upon the requirement of the application, outliers are of particular interest. Sometimes the presence of outliers adversely affects the conclusion so need to be eliminated. Sometimes these outliers become the center of interest that are containing important information about the abnormal behavior of a system (Hubballi, Patra & Nandi, 2011).

There are many data mining techniques minimizing the influence of outliers or eliminating them. Sometimes, the consequence of outliers may severely loss of important hidden information since one person’s noise could be another person’s signal (Bakar, Mohemad, Ahmad et al., 2006).

The application of Outlier detection includes intrusion detection system (Gogoi, Bhattacharyya, Borah et al., 2014), identification of new diseases, financial applications and Credit card fraud detection where outliers may indicate fraudulent activity. By extracting the most relevant features of network traces, packet flow data and packet header information, the outlier or anomaly behavior of activity can be found. Outlier is one of the most essential approach in intrusion detection techniques. The term intrusion can be defined as the unauthorised access to a system or resource. There are majorly three types of intrusion detection approach is used. Pattern-matching or Rule-Based Detection, Signature-Based and outlier detection. The signature based detection is simply used the string matching algorithm. Each time it compares the current signature with the stored signature. If matched then it prompted as intrusion found otherwise not. The current packet details entry is matched to a list of stored patterns and it usually not gives false alarms because of predefined rules. But this it is not able to find the new attacks that occur first time by the subject name from “Manish abc” to “Manish abcd” will the change the signatures and go undetected. Snort is an example of pattern-matching NIDS that can recognize old attacks.

Anomaly or outlier Based Detection that is the process of examine system activities that created as normal activities against the events that are observed to identify deviations (Sahu & Jena, 2016). For example, the normal activity includes that web activity usually in day hours. The primary advantage of this method is that it is effective to find new attacks that occur first time. An initial profile that are supposed to be normal is generated. A dynamic profile, on the other hand, regularly gets updated with additional events. Because of the inherent dynamic behavior of networks and systems, static profiles are not suitable as they get outdated soon. Dynamic profiles do not suffer from this deficiency. This method has the problem of false positives. They often treat benign activity may raise an alarm.

Stateful Protocol Analysis Method is preparing the log activity that supposed to be accepted by each protocol separately and comparing each observed event against them. This analysis is based on vendor developed profiles whereas anomaly-based detection uses host dependent or network dependent profiles. An intrusion can detect by finding the data points using the outlier detection, whose features are distinctly different from the rest of the data. Sometimes outliers are individuals or sometimes groups of objects representing the behavior that is outside the range of what is considered normal. According to the clustering algorithm, outliers are objects that do not exist in clusters of the dataset usually called noise.

The challenge in outlier detection in intrusion detection is to handle a large amount of data of mixed-type that is categorical and numerical data (Gogoi, Bhattacharyya, Borah et al., 2014). Therefore, the outlier algorithm should be scalable to apply on a large volume of the dataset. Normally, the outlier result of a dataset using a scatter plot can be easily visualizing. The points are far away from the normal data points. As a result, outlier detection also known as anomaly or deviation analysis.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 8: 2 Issues (2018)
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing