Models Network Data for Association and Prediction

Models Network Data for Association and Prediction

Yu Wang (Yale University, USA)
DOI: 10.4018/978-1-59904-708-9.ch007
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Data exploratory analysis discovers data structures and patterns with all variables as a whole, but this analysis does not particularly focus on seeking associations between response variables and predictor variables. In this chapter, we will discuss how to identify and measure this response-prediction relationship, which is an essential element in intrusion detection and prevention. Even though the expression for models for association and prediction can have a broad range, in general the goals of modeling for association and prediction in network security are two-fold: (1) to identify variables that are significantly associated with the response variable and (2) to assess the robustness of these variables, if any, in predicting the response. Although the term, model, is perhaps confusing to many people, a model is just a simpli- fied representation of some aspect of the real world, whether an object or observation, or a situation or process. Models are of particular importance for network security because of the size of data and the complex relationship among variables and the desired outcomes. Statistical modeling procedures available for analyzing the response-predictor phenomenon mainly include bivariate analysis and multiple regression-based analysis. Bivariate analysis focuses on the relationship between two variables (e.g., a response and a predictor) without taking into account any impact from other predictor variables on the response variable. The multiple regression modeling approach, on the other hand, requires establishing a regression relationship between a response variable and a set of potential predictor variables, and the predictive power of each of the predictors as adjusted by others. Therefore, a variable associates with the response significantly in the bivariate analysis may no longer hold such an association in the regression analysis after adjusting from other variables. In the following sections, we will review and discuss these two main approaches in detail. For readers who would like to attain a more general knowledge on modeling associations should refer to Mandel (1964), Press & Wilson (1978), Cohen & Cohen (1983), Berry & Feldman (1985), Cox & Snell (1989), McCullagh & Nelder (1989), Agresti (1996), Ryan (1997), Long (1997), Burnham & Anderson (1998), Pampel (2000), Tabachnick & Fidell (2001), Agresti (2002), Myers, Montgomery & Vining (2002), Menard (2002), and O’Connell (2006). Comprehensive reviews on data mining and statistical learning can be found from Vapnik (1998, 1999), Hastie, Tibshirani & Friedman (2001), Bozdogan (2003).
Chapter Preview

Whatever you are by nature, keep to it; never desert your line of talent. Be what nature intended you for and you will succeed.

- Sydney Smith

Top

Introduction

Data exploratory analysis discovers data structures and patterns with all variables as a whole, but this analysis does not particularly focus on seeking associations between response variables and predictor variables. In this chapter, we will discuss how to identify and measure this response-prediction relationship, which is an essential element in intrusion detection and prevention. Even though the expression for models for association and prediction can have a broad range, in general the goals of modeling for association and prediction in network security are two-fold: 1) to identify variables that are significantly associated with the response variable and 2) to assess the robustness of these variables, if any, in predicting the response.

Although the term, model, is perhaps confusing to many people, a model is just a simplified representation of some aspect of the real world, whether an object or observation, or a situation or process. Models are of particular importance for network security because of the size of data and the complex relationship among variables and the desired outcomes. Statistical modeling procedures available for analyzing the response-predictor phenomenon mainly include bivariate analysis and multiple regression-based analysis. Bivariate analysis focuses on the relationship between two variables (e.g., a response and a predictor) without taking into account any impact from other predictor variables on the response variable. The multiple regression modeling approach, on the other hand, requires establishing a regression relationship between a response variable and a set of potential predictor variables, and the predictive power of each of the predictors as adjusted by others. Therefore, a variable associates with the response significantly in the bivariate analysis may no longer hold such an association in the regression analysis after adjusting from other variables. In the following sections, we will review and discuss these two main approaches in detail. For readers who would like to attain a more general knowledge on modeling associations should refer to Mandel (1964), Press & Wilson (1978), Cohen & Cohen (1983), Berry & Feldman (1985), Cox & Snell (1989), McCullagh & Nelder (1989), Agresti (1996), Ryan (1997), Long (1997), Burnham & Anderson (1998), Pampel (2000), Tabachnick & Fidell (2001), Agresti (2002), Myers, Montgomery & Vining (2002), Menard (2002), and O’Connell (2006). Comprehensive reviews on data mining and statistical learning can be found from Vapnik (1998, 1999), Hastie, Tibshirani & Friedman (2001), Bozdogan (2003).

Complete Chapter List

Search this Book:
Reset