Comparison of Feature Vectors in Keystroke Dynamics: A Novelty Detection Approach

Comparison of Feature Vectors in Keystroke Dynamics: A Novelty Detection Approach

Paulo H. Pisani (Universidade Federal do ABC (UFABC), Santo André, Brazil) and Ana C. Lorena (Universidade Federal de São Paulo (UNIFESP), São José dos Campos, Brazil)
Copyright: © 2012 |Pages: 18
DOI: 10.4018/jncr.2012100104
OnDemand PDF Download:
$37.50

Abstract

A number of current applications require algorithms able to extract a model from one-class data and classify unseen data as self or non-self in a novelty detection scenario, such as spam identification and intrusion detection. In this paper the authors focus on keystroke dynamics, which analyses the user typing rhythm to improve the reliability of user authentication process. However, several different features may be extracted from the typing data, making it difficult to define the feature vector. This problem is even more critical in a novelty detection scenario, when data from the negative class is not available. Based on a keystroke dynamics review, this work evaluated the most used features and evaluated which ones are more significant to differentiate a user from another using keystroke dynamics. In order to perform this evaluation, the authors tested the impact on two benchmark databases applying bio-inspired algorithms based on neural networks and artificial immune systems.
Article Preview

1. Introduction

Novelty detectors are a class of classification algorithms that are able to recognize deviations from normal data. These classifiers usually have only data from one class during the training phase, which is the positive class. During the detection phase, after training, they are able to differentiate between positive and negative samples. Anomaly detectors and one-class classifiers are other common names for this class of algorithms (Yu & Cho, 2003). There is a number of problems in which novelty detectors can be applied, such as identification of spam e-mail (Blanzieri & Bryl, 2008) and detection of anomalous behavior in computer systems (Zanero, 2004). All these applications have a common aspect which is that only data from the positive class is usually available.

It is known that the features extracted from data play a key role in the classification process. This problem is even more critical in novelty detection, as one-class classification is usually a harder task than standard binary classification (Khan & Madden, 2009), which uses positive and negative examples for training. In this paper, we focus on keystroke dynamics to illustrate the problem of defining the features to be extracted from data.

Keystroke dynamics analyses the rhythm in which a user types on the keyboard. The recognition of users by keystroke dynamics has been seen as a method to enhance user authentication and avoid identity theft (Hosseinzadeh & Krishnan, 2008). It has been estimated that losses due to identity theft over the world reached the sum of US$ 221 billion in 2003 (Jain, 2006). Just in USA, Javelin Strategy & Research reported that identity theft total fraud amount has risen from US$ 18 billion in 2011 to US$ 20.9 billion in 2012.

It is important to notice that keystroke dynamics can be dealt with either as a novelty detection problem or as a binary classification problem. In the first approach, only data from the legitimate user (positive class) is used during the training of the classifier. Afterwards, in the detection or matching phase, the classifiers have to correctly classify unseen data as legitimate (positive) or intruder (negative). While in the binary classification, the main difference is that samples from both legitimate user and intruders are used during training, what could potentially enhance the classifier accuracy. Nonetheless, in real-world scenarios, intruder samples are not always available. Hence, dealing with keystroke dynamics as a novelty detection problem is closer to practical solutions.

Although keystroke dynamics has been studied for more than 30 years, there seem be no agreement on which features should be extracted from typing data. The goal of this work is to answer the following question: what is the best feature vector for keystroke dynamics in a novelty detection scenario?

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 6: 2 Issues (2017): 1 Released, 1 Forthcoming
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing