Data Reduction

Data Reduction

Yu Wang (Yale University, USA)
DOI: 10.4018/978-1-59904-708-9.ch006
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This chapter discusses several data reduction techniques that are important in intrusion detection and prevention. Network traffic data includes rich information about system and user behavior, but the raw data itself can be difficult to analyze due to its large size. Being able to efficiently reduce data size is one of the key challenges in network security and has been raised by many researchers over the past decade (Lam, Hui & Chung, 1996; Mukkamala, Tadiparthi, Tummala, & Janoski, 2003; Chebrolu, Abraham & Thomas, 2005; Khan, Awad & Thuraisingham, 2007). Recall the concept of the data cube which was presented in Chapter IV; using various approaches, it is possible to reduce the size of data in all three cube dimensions (variables, observations, and occasions). More specifically, we can reduce the total number of observations by sampling network traffic, reduce the total number of variables by eliminating variables that are not robust and do not associate with the outcome of interest, and reduce the number of occasions by taking a sample of the time-related events. We will discuss these approaches in the following sections, including data structure detection, sampling, and sample size determination. In addition to statistical approaches for data reduction, we need to carefully select a data type for each variable to ensure that the final size of a given dataset does not increase due to any inappropriate data types. For example, we shall use the byte to store a binary variable. Data reduction heavily involves multivariate analysis on which a great number of literatures are available on this topic. Readers who are interested in gaining a better understanding of detailed and advanced multivariate analysis can refer to Thomson (1951), Bartholomew (1987), Snook & Gorsuch (1989), Everitt & Dunn (1991), Kachigan (1991), Loehlin (1992), Hatcher & Stepanski (1994), Rencher (1995), Tabachnick (2000), and Everitt (2005).
Chapter Preview

One must not lose desires. They are mighty stimulants to creativeness, to love, and to long life.

- Alexander A. Bogomoletz

Top

Introduction

This chapter discusses several data reduction techniques that are important in intrusion detection and prevention. Network traffic data includes rich information about system and user behavior, but the raw data itself can be difficult to analyze due to its large size. Being able to efficiently reduce data size is one of the key challenges in network security and has been raised by many researchers over the past decade (Lam, Hui & Chung, 1996; Mukkamala, Tadiparthi, Tummala, & Janoski, 2003; Chebrolu, Abraham & Thomas, 2005; Khan, Awad & Thuraisingham, 2007).

Recall the concept of the data cube which was presented in Chapter 4; using various approaches, it is possible to reduce the size of data in all three cube dimensions (variables, observations, and occasions). More specifically, we can reduce the total number of observations by sampling network traffic, reduce the total number of variables by eliminating variables that are not robust and do not associate with the outcome of interest, and reduce the number of occasions by taking a sample of the time-related events. We will discuss these approaches in the following sections, including data structure detection, sampling, and sample size determination. In addition to statistical approaches for data reduction, we need to carefully select a data type for each variable to ensure that the final size of a given dataset does not increase due to any inappropriate data types. For example, we shall use the byte to store a binary variable.

Data reduction heavily involves multivariate analysis on which a great number of literatures are available on this topic. Readers who are interested in gaining a better understanding of detailed and advanced multivariate analysis can refer to Thomson (1951), Bartholomew (1987), Snook & Gorsuch (1989), Everitt & Dunn (1991), Kachigan (1991), Loehlin (1992), Hatcher & Stepanski (1994), Rencher (1995), Tabachnick (2000), and Everitt (2005).

Complete Chapter List

Search this Book:
Reset