Article Preview
Top1. Introduction
Novelty detectors are a class of classification algorithms that are able to recognize deviations from normal data. These classifiers usually have only data from one class during the training phase, which is the positive class. During the detection phase, after training, they are able to differentiate between positive and negative samples. Anomaly detectors and one-class classifiers are other common names for this class of algorithms (Yu & Cho, 2003). There is a number of problems in which novelty detectors can be applied, such as identification of spam e-mail (Blanzieri & Bryl, 2008) and detection of anomalous behavior in computer systems (Zanero, 2004). All these applications have a common aspect which is that only data from the positive class is usually available.
It is known that the features extracted from data play a key role in the classification process. This problem is even more critical in novelty detection, as one-class classification is usually a harder task than standard binary classification (Khan & Madden, 2009), which uses positive and negative examples for training. In this paper, we focus on keystroke dynamics to illustrate the problem of defining the features to be extracted from data.
Keystroke dynamics analyses the rhythm in which a user types on the keyboard. The recognition of users by keystroke dynamics has been seen as a method to enhance user authentication and avoid identity theft (Hosseinzadeh & Krishnan, 2008). It has been estimated that losses due to identity theft over the world reached the sum of US$ 221 billion in 2003 (Jain, 2006). Just in USA, Javelin Strategy & Research reported that identity theft total fraud amount has risen from US$ 18 billion in 2011 to US$ 20.9 billion in 2012.
It is important to notice that keystroke dynamics can be dealt with either as a novelty detection problem or as a binary classification problem. In the first approach, only data from the legitimate user (positive class) is used during the training of the classifier. Afterwards, in the detection or matching phase, the classifiers have to correctly classify unseen data as legitimate (positive) or intruder (negative). While in the binary classification, the main difference is that samples from both legitimate user and intruders are used during training, what could potentially enhance the classifier accuracy. Nonetheless, in real-world scenarios, intruder samples are not always available. Hence, dealing with keystroke dynamics as a novelty detection problem is closer to practical solutions.
Although keystroke dynamics has been studied for more than 30 years, there seem be no agreement on which features should be extracted from typing data. The goal of this work is to answer the following question: what is the best feature vector for keystroke dynamics in a novelty detection scenario?