Dimension Reduction and its Effects on Clustering for Intrusion Detection

Dimension Reduction and its Effects on Clustering for Intrusion Detection

Peyman Kabiri (Iran University of Science and Technology, Iran) and Ali Ghorbani (University of New Brunswick, Canada)
DOI: 10.4018/978-1-60960-836-1.ch007
OnDemand PDF Download:
No Current Special Offers


With recent advances in network based technology and the increased dependency of our every day life on this technology, assuring reliable operation of network based systems is very important. During recent years, a number of attacks on networks have dramatically increased and consequently interest in network intrusion detection has increased among the researchers. During the past few years, different approaches for collecting a dataset of network features, each with its own assumptions, have been proposed to detect network intrusions. Recently, many research works have been focused on better understanding of the network feature space so that they can come up with a better detection method. The curse of dimensionality is still a big obstacle in front of the researchers in network intrusion detection. In this chapter, DARPA’99 dataset is used for the study. Features in that dataset are analyzed with respect to their information value. Using the information value of the features, the number of dimensions in the data is reduced. Later on, using several clustering algorithms, effects of the dimension reduction on the dataset are studied and the results are reported.
Chapter Preview


In the past two decades with the rapid progress in the Internet based technology, new application areas for computer network have emerged. At the same time, wide spread progress in the Local Area Network (LAN) and Wide Area Network (WAN) application areas in business, finance, industry, security and healthcare sectors made us more dependent on the computer networks. All of these application areas made the network an attractive target for the abuse and a big vulnerability for the community. A fun to do job or a challenge to win action for some people became a nightmare for the others. In many cases malicious acts made this nightmare to become a reality.

In addition to the hacking, new entities like worms, Trojans and viruses introduced more panic into the networked society. As the current situation is a relatively new phenomenon, network defenses are weak. However, due to the popularity of the computer networks, their connectivity and our ever growing dependency on them, realization of the threat can have devastating consequences. Securing such an important infrastructure has become the priority one research area for many researchers.

One of the major concerns is to make sure that in case of an intrusion attempt, the system is able to detect and to report it. Once the detection is reliable, next step would be to protect and defend the network (response). In other words, the IDS will be upgraded to an Intrusion Detection and Response System (IDRS).

However, no part of the IDS is currently at a fully reliable level. Even though researchers are concurrently engaged in working on both detection and respond sides of the system. A major problem in the IDS is the guarantee for the intrusion detection. This is the reason why in many cases IDSs are used together with a human expert. In this way, IDS is actually helping the network security officer and it is not reliable enough to be trusted on its own. The reason is the inability of IDS to detect the new or altered attack patterns. Although the latest generation of the detection techniques has significantly improved the detection rate, still there is a long way to go.

There are two major approaches for detecting intrusions, signature-based and anomaly-based intrusion detection. In the first approach, attack patterns or the behavior of the intruder is modeled (attack signature is modeled). Here the system will signal the intrusion once a match is detected. However, in the second approach normal behavior of the network is modeled. In this approach, the system will raise the alarm once the behavior of the network does not match with its normal behavior. There is another Intrusion Detection (ID) approach that is called specification-based intrusion detection. In this approach, the normal behavior (expected behavior) of the host is specified and consequently modeled. In this approach, as a direct price for the security, freedom of operation for the host is limited.

Another major problem in this research area is the speed of detection. Computer networks have a dynamic nature in a sense that information and data within them are continuously changing. Therefore, to accurately and promptly detect an intrusion into the network, the system has to operate in real time. Operating in real time is not just to perform the detection in real time, but is to adapt to the new dynamics in the network. Real time operating IDS is an active research area pursued by many researchers. Most of the research works are aimed to introduce the most time efficient methodologies. The goal is to make the implemented methods suitable for the real time implementation.

The real time requirement for implementation of an IDS asks for a short processing time. However, large number of parameters makes it very difficult to achieve such a speed. In other words, curse of dimensionality is one of the greatest obstacles in front of the IDS technology. Work presented in this paper aims to study this problem and will try to help to break this curse. It is clear that not all the selected parameters are as effective and as influential as the rest. Some features in the feature space may have more influence on the final result than the others.

Authors of this paper have briefly studied the current literature in the field of intrusion detection (Kabiri and Ghorbani, 2005) and noticed that curse of dimensionality is one of the major problems in the intrusion detection. Evaluation of the features with respect to their importance in the intrusion detection process is an important issue that requires further research.

Complete Chapter List

Search this Book: