Impact of PDS Based kNN Classifiers on Kyoto Dataset

Impact of PDS Based kNN Classifiers on Kyoto Dataset

Kailasam Swathi (NRI Institute of Technology, Agiripalli, India) and Bobba Basaveswara Rao (Acharya Nagarjuna University, Guntur, India)
Copyright: © 2019 |Pages: 12
DOI: 10.4018/IJRSDA.2019040105

Abstract

This article compares the performance of different Partial Distance Search-based (PDS) kNN classifiers on a benchmark Kyoto 2006+ dataset for Network Intrusion Detection Systems (NIDS). These PDS classifiers are named based on features indexing. They are: i) Simple PDS kNN, the features are not indexed (SPDS), ii) Variance indexing based kNN (VIPDS), the features are indexed by the variance of the features, and iii) Correlation coefficient indexing-based kNN (CIPDS), the features are indexed by the correlation coefficient of the features with a class label. For comparative study between these classifiers, the computational time and accuracy are considered performance measures. After the experimental study, it is observed that the CIPDS gives better performance in terms of computational time whereas VIPDS shows better accuracy, but not much significant difference when compared with CIPDS. The study suggests to adopt CIPDS when class labels were available without any ambiguity, otherwise it suggested the adoption of VIPDS.
Article Preview
Top

1. Introduction

Network Intrusion refers to a number of techniques that allows the malicious users to penetrate into the computer networks and exploit the computing and network resources. Network Intrusion Detection System (NIDS) is a technology that uses network intrusion datasets and identifies the intruders by applying machine learning strategies on these datasets to detect malicious activities. A network intrusion dataset is a collection of network traces i.e., traffic captures from network for a period of time.

The quality and quantity of network datasets will aid machine learning strategies to build heuristic systems for given real-world problems. These heuristic systems will help the decision makers to ever cure risk. Early detection of intrusion helps in control and prevention of malicious activities in a system.

Machine learning algorithms are heuristic approaches to solve complicated problems for which a human designer unable to define the appropriate rules in an explicit form. It is very difficult to construct an efficient real-time NIDS especially for high speed network traffics.

To build such an ideal solution and evaluation of the same, different kinds of datasets are made available for researchers. One such detection system is Kyoto 2006+ which is a real-world data set and is nearer to the current network problems. This dataset is provided with class label hence supervised learning algorithms were preferred for attack predictions. In general attack and normal are class labels of these intrusion data sets.

Intrusion detection techniques availing these datasets with class labels and exhibits good results by using machine learning methodologies such as Support Vector Machines (SVM), k-Nearest Neighbor (kNN), Bayes Networks and Decision Tree Inductions etc.

Even though the kNN classifier is a lazy learning algorithm, it is used by huge number of researchers because of its good accuracy rates. Researchers are trying to minimize the classifier complexity as well as classification times of kNN algorithm while maintaining the accuracy rate high.

Partial Distance Search (PDS): is one form of kNN classification approach that makes the classifier faster when compared with general kNN classification algorithm (Basaveswara & Swathi, 2017; Eid et al., 2013). In this approach the distance computation procedure (between a known sample and a test sample/new request) will be terminated at a specific feature value without computing all feature values whenever the distance is larger than the precomputed/stored least k nearest distances, otherwise this distance will be added to the previous k nearest distances by replacing the kth distance. In this approach most of the training samples are discarded quickly that reduced the computational cost of the classifier. Especially when the sample data set is very large such as KDD cup'99 and Kyoto 2006+, PDS approach yields less computational time.

PDS kNN Algorithm:

  • Step 1:

    Compute first k squared distances vector D = (d1,d2, …, dk) among the first k sample vectors (y1, y2, …, yk) in sample set S where s is the sample size and n is total number of features for each sample belongs to S with the input vector x for which class label need to be predicted and di = d2(x, yi), i=1,2,3...ki.e., IJRSDA.2019040105.m01

  • Step 2:

    Place these first k distances into vector D in ascending order i.e., d1≤ d2≤.. ≤ dk.

  • Step 3:

    for t in range of (k+1, s):

    • Step 3.1:

      Calculate the distance dt between yt and x as follows:

    • Step 3.2:

      set dt = 0

    • Step 3.3:

      for p in range (1, n):

      • Step 3.3.1:

        Compute dt + = IJRSDA.2019040105.m02

      • Step 3.3.2:

        If dt > dk then go to Step 3.

    • Step 3.4:

      set D by replacing IJRSDA.2019040105.m03 and reorder the vector D in ascending order.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 7: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 6: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 5: 4 Issues (2018)
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 2 Issues (2015)
Volume 1: 2 Issues (2014)
View Complete Journal Contents Listing