Exploring Network Data

Exploring Network Data

Yu Wang (Yale University, USA)
DOI: 10.4018/978-1-59904-708-9.ch005
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this chapter, we will review the basic concepts and procedures for data explanatory analysis, which provides the first step toward understanding and evaluating data. Data exploration is extremely important in network security because the volume of network traffic data is very large. We will discuss descriptive analysis, visualizing analysis and data transformation techniques in this chapter. The general idea behind explanatory analysis is to examine data without pre-conceived beliefs or notions and to let the data tell us about the phenomena of the subject(s) being studied. It not just focuses on displaying or extracting any “signal” from the data in the presence of noise, and discovers the type of information that the data holds (Everitt, 2005), but it also provides an essential direction for converting data from a high-dimensional space to a low-dimensional space. We may not know what the data looks like and we may not have specific questions in mind to analyze the data, but data exploration seeks patterns and variable relationships in the data, and provides paths for further data examinations. For example, outliers in traffic streams could represent important information about attacks (Petrovskiy, 2003; Angiulli & Fassetti, 2007; Kundur, Luh, Okorafor, & Zourntos, 2008; Nayyar & Ghorbani, 2008), but also could represent data errors. Data explanatory analysis provides a quickly and simply approach to discover such a paradox. Readers who like to obtain a comprehensive introduction to data exploration analysis should refer to Blaikie (2003). Recently, with advances in computer hardware and software, visualizing large datasets has become possible, and more exploratory data analyses have been conducted based on the graphical method that visually conveys the information. Although the graphical method alone does not present rich convincing evidence for drawing robust conclusions, it does provide a road map for future analyses. It is also an important tool for illustrating data to those who have little to no statistical knowledge.
Chapter Preview

You can’t choose the ways in which you will be tested.

- Robert J. Sawyer

Top

Descriptive Analysis

Descriptive analysis focuses on individual variables and presents its attributes by describing its frequency distribution, measuring its central tendency, spread, and range of values. We expect to gain a general overarching picture of the data from the descriptive analysis to support any further analysis. For example, by checking individual variables, we will know how well the sample data is, and whether or not we need to create new variables from the initial variables, check data reliability and validity (e.g., values outside oh the possible range), or to assess needs of any data transaction or normality. The main measurements for descriptive analysis are data structures, variables’ mean, frequency, dispersion, and outliers, and we will discuss each of these individually in the following sections. Keep in mind that descriptive analysis alone is not robust enough to draw scientific conclusions because it does not have the capacity to handle a multivariate situation.

Complete Chapter List

Search this Book:
Reset