User Profiling Using Keystroke Dynamics and Rotation Forest

User Profiling Using Keystroke Dynamics and Rotation Forest

Ioannis Tsimperidis, Avi Arampatzis
DOI: 10.4018/978-1-7998-9430-8.ch001
(Individual Chapters)
No Current Special Offers


The anonymity that users can maintain when connecting to the internet, in addition to the positive effects, such as being able to express their views and ideas freely without fear of retaliation, also carries some risks, such as the fact that it is a significant advantage for malicious users. In order to remove the complete anonymity of internet users, so as to protect unsuspecting users, this work attempts to identify some of their characteristics, namely gender, age, and handedness, using data coming from typing. For this purpose, the rotation forest is used as a classifier, and keystroke dynamics features are selected based on the chi-square feature selection procedure. The final results show that user profiling can be achieved with an accuracy of 88.9% in gender prediction, 86.3% in age prediction, and 94.3% in handedness prediction.
Chapter Preview


People work, communicate, trade goods and services, are entertained and educated, and much more, in a very different way than a few years ago. Telecommunication and teleconferencing applications, various eShops, online games, courses of any kind, and many more, have made their appearance serving the needs of individuals, companies, and organizations. The cause of all these rapid changes is the evolution and dissemination of the Internet and the services it offers. Today, a user has the ability to connect with other users from anywhere in the world through video calling or instant messaging applications, or through social networks. Also, every user has the opportunity to purchase products or services from the global market, with the same ease that he/she would do in his/her neighborhood, or even easier. It is also possible to find work for or with companies and individuals that may be located thousands of kilometers away.

Many opportunities for personal, national, and global growth and development are offered, but at the same time there are many risks, such as financial frauds, seduction of minors, hacking, anonymous threats, etc. (Degtereva et al., 2020). One of the most important reasons for the existence of these risks is the partial or complete anonymity that a user can maintain when connecting to the Internet. This anonymity, on the one hand, often proves useful as it helps the user to express and freely be creative, but on the other hand may alter his/her behavior by turning him/her into a rude, aggressive, and disrespectful person (Krysowski & Tremewan, 2020). In addition, anonymity or concealment of true identity is one of the major advantages of malicious users in their plans to deceive unsuspecting users and/or carry out cyber-attacks.

Also noteworthy is that the way in which users interact on the Internet is shaped by the fact that although a variety of communication methods are offered, such as voice calls, video calls, file sharing, etc., text is still the dominant form of communication (Nitzburg & Farber, 2019) among users. A variety of instant messaging applications are available and many companies invest significant amounts of money in their development. If we additionally consider the email service, the comments made by users on various social media, and searches carried out in search engines, each of which is primarily in text, a backdrop is formed in which text, or rather text typing, plays a prominent role on the World Wide Web, in user communication, and in computer operations in general.

Keystroke dynamics are a biometric trait from which information can be extracted by exploiting data that comes from the way a user types on a real or virtual keyboard. Studies in keystroke dynamics have been conducted for about fifty years and their object is mainly user authentication (Raul et al., 2020) in order to replace or enhance the authentication method using passwords. Keystroke dynamics were also used to classify users according to an inherent or acquired characteristic, such as gender or age, as well as to assess users’ physical and mental condition, such as whether they were exhausted (Ulinskasa et al., 2018), if they suffer from depression (Mastoras et al., 2019), or if they suffer from a neurological disease (Lam et al., 2020). Many years of experimentation with keystroke dynamics resulted in the development of systems with very good performance in user authentication and user classification.

Key Terms in this Chapter

Keystroke Dynamics: The way a user uses a keyboard, physical or virtual.

Chi-Square Test: The procedure used to examine the differences between categorical variables.

User Profiling: The process of identifying some characteristics of a user.

Keystroke Duration: The time elapsed between the pressing and the releasing of a key. In the literature it is also found as dwell time, or hold time, or press hold, or key press time.

Digital forensics: The process of uncovering and interpreting electronic data.

Feature Selection: The process of reducing the number of input variables when developing a predictive model.

Digram Latency: The time elapsed between the pressing or releasing of a key and the pressing or releasing of the next key.

Complete Chapter List

Search this Book: