Classification plays an important role in network security. It classifies network traffic into different categories based on the characteristics of the traffic and aims to prevent network attacks by detecting intrusion as early as possible. If a labeled response variable is available then the classification belongs to the statistically supervised learning theme. The term ”supervised learning” comes from the Artificial Intelligence field where research is focused on machine learning (Nilsson, 1996). In general, a supervised learning task can be described by giving a training sample with known patterns,
, represented by predictors,
, and a labeled response variable,
, to select
values for new
values.
may be either a binary class, or multilevel classes,
. As we discussed previously, these classes cannot be determined absolutely and they are based on the degree of our belief, which is expressed in terms of probability (Woodworth, 2004). In this chapter, we will focus mainly on the binary classification task and we will discuss several modeling approaches, including both parametric and nonparametric methods. Readers who are interested in obtaining fundamental information on supervised learning and machine learning algorithms should refer to Lane & Brodley (1997), Vapnik (1998, 1999), Hosmer & Lemeshow (2000), Duda, Hart & Stork (2001), Hastie, Tibshirani & Friedman (2001), Müller Mika, Rätsch, Tsuda & Schölkopf (2001), Herbrich (2002), Vittinghoff, Glidden, Alpaydin (2004), Shiboski & McCulloch (2005), Maloof (2006), Neuhaus & Bunke (2007), and Diederich (2008).