A Comparative Study of Machine Learning Techniques for Gesture Recognition Using Kinect

A Comparative Study of Machine Learning Techniques for Gesture Recognition Using Kinect

Rodrigo Ibañez (ISISTAN (UNICEN-CONICET) Research Institute, Argentina), Alvaro Soria (ISISTAN (UNICEN-CONICET) Research Institute, Argentina), Alfredo Raul Teyseyre (ISISTAN (UNICEN-CONICET) Research Institute, Argentina), Luis Berdun (ISISTAN (UNICEN-CONICET) Research Institute, Argentina) and Marcelo Ricardo Campo (ISISTAN (UNICEN-CONICET) Research Institute, Argentina)
DOI: 10.4018/978-1-5225-0435-1.ch001
OnDemand PDF Download:
List Price: $37.50


Progress and technological innovation achieved in recent years, particularly in the area of entertainment and games, have promoted the creation of more natural and intuitive human-computer interfaces. For example, natural interaction devices such as Microsoft Kinect allow users to explore a more expressive way of human-computer communication by recognizing body gestures. In this context, several Supervised Machine Learning techniques have been proposed to recognize gestures. However, scarce research works have focused on a comparative study of the behavior of these techniques. Therefore, this chapter presents an evaluation of 4 Machine Learning techniques by using the Microsoft Research Cambridge (MSRC-12) Kinect gesture dataset, which involves 30 people performing 12 different gestures. Accuracy was evaluated with different techniques obtaining correct-recognition rates close to 100% in some results. Briefly, the experiments performed in this chapter are likely to provide new insights into the application of Machine Learning technique to facilitate the task of gesture recognition.
Chapter Preview


In the literature, there are numerous approaches to gesture recognition from human body movements captured by video cameras. As mention in (Gavrila, 1999), the ability to recognize humans and their activities by vision is crucial for a machine to interact intelligently and effortlessly with a human-inhabited environment. Over the years, there has been strong interest in human movement from a wide variety of disciplines. In psychology, there have been the classic studies on human perception by Johansson (1973) or, in the hand gesture area, how humans use and interpret gestures (McNeill, 1992). In kinesiology the goal has been to develop models of the human body that explain how it works mechanically and how one might increase its movement efficiency (Calvert & Chapman, 1994). Computer graphics has dealt with the synthesis of human movement and some of the issues have been how to specify spatial interactions and high-level tasks for the human models; see (Badler, Phillips, & Webber, 1993; Badler & Smoliar, 1979; Magnenat-Thalmann & Thalmann, 1990). For a review of the state of the art in human movement recognition in general see (Gavrila, 1999; Aggarwal & Cai, 1999; Yu, Cheng, Cheng, & Zhou, 2004; Weinland, Ronfard, & Boyer, 2011; Turaga, Chellappa, Subrahmanian, & Udrea, 2008); for facial expressions see (Mitra, & Acharya, 2007); for hand gestures see (Pavlovic, Sharma, & Huang, 1997; Wachs, Kölsch, Stern, & Edan, 2011).

Key Terms in this Chapter

Kinect: A line of motion sensing input devices by Microsoft for Xbox 360 and Xbox One video game consoles and Windows PCs (codenamed in development as Project Natal).

Gesture Recognition: A topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms.

Machine Learning: A subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence.

Skeletal Tracking: A model used to represent the human body as number of joints representing body parts such as head, neck, shoulders, and arms.

Motion Detection: The process of detecting a change in the position of an object relative to its surroundings or a change in the surroundings relative to an object. Motion detection can be achieved by either mechanical or electronic methods.

Stick Model: The model used by the Kinect Sensor to represent the positions (X, Y, Z) of the 20 body joints for a given time.

Pattern Recognition: A branch of machine learning that focuses on the recognition of patterns and regularities in data; in some cases, considered to be nearly synonymous with machine learning.

Supervised Learning: The machine learning task of inferring a function from labeled training data that consist of a set of training examples.

Complete Chapter List

Search this Book: