Measuring User Behavior

Measuring User Behavior

Yu Wang (Yale University, USA)
DOI: 10.4018/978-1-59904-708-9.ch008
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Measurement plays a fundamental role in our modern world, and the measurement theory uses statistical tools to measure and to analyze data. In this chapter, we will examine several statistical techniques for measuring user behavior. We will first discuss the fundamental characteristics of user behavior, and then we will describe the scoring and profiling approaches to measure user behavior. The fundamental idea of measurement theory is that measurements are not the same as the outcome being measured. Hence, if we want to draw conclusions about the outcome, we must take into account the nature of the correspondence between the outcome and the measurements. Our goal for measuring user behavior is to understand the behavior patterns so we can further profile users or groups correctly. Readers who are interested in basic measurement theory should refer to Krantz, Luce, Suppes & Tversky (1971), Suppes, Krantz, Luce & Tversky (1989), Luce, Krantz, Suppes & Tversky (1991), Hand (2004), and Shultz & Whitney (2005). Any measurement could involve two types of errors, systematic errors and random errors. A systematic error remains the same direction throughout a set of measurement processes, and can have all positive or all negative (or both) values consistently. Generally, a systematic error is difficult to identify and account for. System errors generally originate in one of two ways: (1) error of calibration, and (2) error of use. Error due to calibration occurs, for example, if network data is collected incorrectly. More specifically, if an allowable value for one variable should have a range from 1 to 1000 but we incorrectly limit the range to a maximum of 100, then all the collected traffic data corresponding to this variable will be affected in the same way, giving rise to a systematic error. Errors of use occur, for example, if the data is collected correctly but was somehow transferred incorrectly. If we define a “byte” as a data type for a variable with a maximum range greater than 256, we expect incorrect results on observations with values greater than 256 for this variable. A random error varies from a process to process and is equally likely to be randomly selected as positive or negative. Random errors arise because of either uncontrolled variables or specimen variations. In any case, the idea is to control all variables that can influence the result of the measurement and to control them closely enough that the resulting random errors are no longer objectionable. Random errors can be addressed with statistical methods. In most measurements, only random errors will contribute to estimates of probable error. One of the common random errors in measuring user behavior is the variance. A robust profiling measurement has to be able to take into account the variances in profiling patterns on (1) the network system side, such as variances in network groups or domains, traffic volume, and operating systems, (2) the user side, such as job responsibilities, working schedules, department categorization, security privileges, and computer skills must also be considered. The profiling measurement must be able to separate such variances from the system and user sides. Hence, revolutionizing network infrastructure or altering employment would have less of an impact on the overall profiling system. Recently, the hierarchical generalized linear model has been increasingly used to address such variances; we will further discuss this modern technique later in this chapter.
Chapter Preview

You are alive. Do something. The directive in life, the moral imperative was so uncomplicated. It could be expressed in single words, not complete sentences. It sounded like this: Look. Listen. Choose. Act.

- Barbara Hall

Top

Introduction

Measurement plays a fundamental role in our modern world, and the measurement theory uses statistical tools to measure and to analyze data. In this chapter, we will examine several statistical techniques for measuring user behavior. We will first discuss the fundamental characteristics of user behavior, and then we will describe the scoring and profiling approaches to measure user behavior. The fundamental idea of measurement theory is that measurements are not the same as the outcome being measured. Hence, if we want to draw conclusions about the outcome, we must take into account the nature of the correspondence between the outcome and the measurements. Our goal for measuring user behavior is to understand the behavior patterns so we can further profile users or groups correctly. Readers who are interested in basic measurement theory should refer to Krantz, Luce, Suppes & Tversky (1971), Suppes, Krantz, Luce & Tversky (1989), Luce, Krantz, Suppes & Tversky (1991), Hand (2004), and Shultz & Whitney (2005).

Any measurement could involve two types of errors, systematic errors and random errors. A systematic error remains the same direction throughout a set of measurement processes, and can have all positive or all negative (or both) values consistently. Generally, a systematic error is difficult to identify and account for. System errors generally originate in one of two ways: 1) error of calibration, and 2) error of use. Error due to calibration occurs, for example, if network data is collected incorrectly. More specifically, if an allowable value for one variable should have a range from 1 to 1000 but we incorrectly limit the range to a maximum of 100, then all the collected traffic data corresponding to this variable will be affected in the same way, giving rise to a systematic error. Errors of use occur, for example, if the data is collected correctly but was somehow transferred incorrectly. If we define a “byte” as a data type for a variable with a maximum range greater than 256, we expect incorrect results on observations with values greater than 256 for this variable.

A random error varies from a process to process and is equally likely to be randomly selected as positive or negative. Random errors arise because of either uncontrolled variables or specimen variations. In any case, the idea is to control all variables that can influence the result of the measurement and to control them closely enough that the resulting random errors are no longer objectionable. Random errors can be addressed with statistical methods. In most measurements, only random errors will contribute to estimates of probable error. One of the common random errors in measuring user behavior is the variance. A robust profiling measurement has to be able to take into account the variances in profiling patterns on 1) the network system side, such as variances in network groups or domains, traffic volume, and operating systems, 2) the user side, such as job responsibilities, working schedules, department categorization, security privileges, and computer skills must also be considered. The profiling measurement must be able to separate such variances from the system and user sides. Hence, revolutionizing network infrastructure or altering employment would have less of an impact on the overall profiling system. Recently, the hierarchical generalized linear model has been increasingly used to address such variances; we will further discuss this modern technique later in this chapter.

Complete Chapter List

Search this Book:
Reset