Yu Wang (Yale University, USA)
DOI: 10.4018/978-1-59904-708-9.ch012
OnDemand PDF Download:
List Price: $37.50


Increasing the accuracy of classification has been a constant challenge in the network security area. While expansively increasing in the volume of network traffic and advantage in network bandwidth, many classification algorithms used for intrusion detection and prevention face high false positive and false negative rates. A stream of network traffic data with many positive predictors might not necessary represent a true attack, and a seemingly anomaly-free stream could represent a novel attack. Depending on the infrastructure of a network system, traffic data can become very large. As a result of such large volumes of data, a very low misclassification rate can yield a large number of alarms; for example, a system with 22 million hourly traffics with a 1% misclassification rate could have approximately 75 alarms within a second (excluding repeated connections). Validating every such case for review is not practical. To address this challenge we can improve the data collection process and develop more robust algorithms. Unlike other research areas, such as the life sciences, healthcare, or economics, where an analysis can be achieved based on a single statistical approach, a robust intrusion detection scheme need to be constructed hierarchically with multiple algorithms. For example, profiling and classifying user behavior hierarchically, using hybrid algorithms (e.g., combining statistics and AI). On the other hand, we can improve the precision of classification by carefully evaluating the results. There are several key elements that are important for statistical evaluation in classification and prediction, such as reliability, sensitivity, specificity, misclassification, and goodness-of-fit. We also need to evaluate the goodness of the data (consistency and repeatability), goodness of the classification, and goodness of the model. We will discuss these topics in this chapter.
Chapter Preview

I have always been delighted at the prospect of a new day, a fresh try, one more start, with perhaps a bit of magic waiting somewhere behind the morning.

- J.B. Priestly


Data Reliability, Validity And Quality

Data reliability and validity play essential roles in network security. Reliability refers to whether or not data can be replicated from either one observer to another (inter-observer reliability) or by the same observer on more than one occasion (intra-observer reliability). Reliability also means that for any data elements used, the results are reasonably complete and accurate, meet our intended purposes, and are not subject to inappropriate alteration (e.g., variables agreement from two sources). Data reliability analysis addresses the uncertainty brought by data and can be considered as a special type of risk analysis (Aven, 2005). Reliable data has basic requirements, including completeness, accuracy and consistency. Completeness reflects that the data contains all of the variables and observations required for a task; accuracy reflects that the data is collected from the correct sources and is recorded correctly; consistency refers to the need to obtain and use data that is clear and well defined enough to yield similar results in similar analyses. For example, if certain data is collected at multiple network systems, inconsistent interpretation of the data rules can lead to data that is unreliable when aggregated them as a whole. Assessments of reliability should be made in the broader context of the particular characteristics of the engagement and the risk associated with the possibility of using data of insufficient reliability.

Complete Chapter List

Search this Book: