Article Preview
Top1. Introduction
Network Intrusion refers to a number of techniques that allows the malicious users to penetrate into the computer networks and exploit the computing and network resources. Network Intrusion Detection System (NIDS) is a technology that uses network intrusion datasets and identifies the intruders by applying machine learning strategies on these datasets to detect malicious activities. A network intrusion dataset is a collection of network traces i.e., traffic captures from network for a period of time.
The quality and quantity of network datasets will aid machine learning strategies to build heuristic systems for given real-world problems. These heuristic systems will help the decision makers to ever cure risk. Early detection of intrusion helps in control and prevention of malicious activities in a system.
Machine learning algorithms are heuristic approaches to solve complicated problems for which a human designer unable to define the appropriate rules in an explicit form. It is very difficult to construct an efficient real-time NIDS especially for high speed network traffics.
To build such an ideal solution and evaluation of the same, different kinds of datasets are made available for researchers. One such detection system is Kyoto 2006+ which is a real-world data set and is nearer to the current network problems. This dataset is provided with class label hence supervised learning algorithms were preferred for attack predictions. In general attack and normal are class labels of these intrusion data sets.
Intrusion detection techniques availing these datasets with class labels and exhibits good results by using machine learning methodologies such as Support Vector Machines (SVM), k-Nearest Neighbor (kNN), Bayes Networks and Decision Tree Inductions etc.
Even though the kNN classifier is a lazy learning algorithm, it is used by huge number of researchers because of its good accuracy rates. Researchers are trying to minimize the classifier complexity as well as classification times of kNN algorithm while maintaining the accuracy rate high.
Partial Distance Search (PDS): is one form of kNN classification approach that makes the classifier faster when compared with general kNN classification algorithm (Basaveswara & Swathi, 2017; Eid et al., 2013). In this approach the distance computation procedure (between a known sample and a test sample/new request) will be terminated at a specific feature value without computing all feature values whenever the distance is larger than the precomputed/stored least k nearest distances, otherwise this distance will be added to the previous k nearest distances by replacing the kth distance. In this approach most of the training samples are discarded quickly that reduced the computational cost of the classifier. Especially when the sample data set is very large such as KDD cup'99 and Kyoto 2006+, PDS approach yields less computational time.
PDS kNN Algorithm:
- Step 1:
Compute first k squared distances vector D = (d1,d2, …, dk) among the first k sample vectors (y1, y2, …, yk) in sample set S where s is the sample size and n is total number of features for each sample belongs to S with the input vector x for which class label need to be predicted and di = d2(x, yi), i=1,2,3...ki.e.,
- Step 2:
Place these first k distances into vector D in ascending order i.e., d1≤ d2≤.. ≤ dk.
- Step 3:
for t in range of (k+1, s):