Classification Based on Unsupervised Learning

Classification Based on Unsupervised Learning

Yu Wang (Yale University, USA)
DOI: 10.4018/978-1-59904-708-9.ch010

Abstract

The requirement for having a labeled response variable in training data from the supervised learning technique may not be satisfied in some situations: particularly, in dynamic, short-term, and ad-hoc wireless network access environments. Being able to conduct classification without a labeled response variable is an essential challenge to modern network security and intrusion detection. In this chapter we will discuss some unsupervised learning techniques including probability, similarity, and multidimensional models that can be applied in network security. These methods also provide a different angle to analyze network traffic data. For comprehensive knowledge on unsupervised learning techniques please refer to the machine learning references listed in the previous chapter; for their applications in network security see Carmines, Edward & McIver (1981), Lane & Brodley (1997), Herrero, Corchado, Gastaldo, Leoncini, Picasso & Zunino (2007), and Dhanalakshmi & Babu (2008). Unlike in supervised learning, where for each vector 1 2 ( , , , ) n X x x x = ? we have a corresponding observed response, Y, in unsupervised learning we only have X, and Y is not available either because we could not observe it or its frequency is too low to be fit ted with a supervised learning approach. Unsupervised learning has great meanings in practice because in many circumstances, available network traffic data may not include any anomalous events or known anomalous events (e.g., traffics collected from a newly constructed network system). While high-speed mobile wireless and ad-hoc network systems have become popular, the importance and need to develop new unsupervised learning methods that allow the modeling of network traffic data to use anomaly-free training data have significantly increased.
Chapter Preview

Truth and love are the only things worth living for and the only things worth dying for.

- Rebecca Ann Talcott

Top

Probability Models

In this section, we discuss three probabilistic classification modeling approaches: a density estimation-based model, a latent class model, and a hidden Markov model. All these models do not require a labeled variable in the training data and can be considered as unsupervised learning approaches.

Complete Chapter List

Search this Book:
Reset