A Novel Approach to Kinect-Based Gesture Recognition for HCI Applications

A Novel Approach to Kinect-Based Gesture Recognition for HCI Applications

Sriparna Saha (Maulana Abul Kalam Azad University of Technology, West Bengal, India), Rimita Lahiri (Jadavpur University, India) and Amit Konar (Jadavpur University, India)
DOI: 10.4018/978-1-5225-9643-1.ch004


With the advent of Kinect and other modern-day sensors, gesture recognition has emerged as one of the most promising research disciplines for developing innovative and efficient human-computer interaction platforms. In the present work, the authors aim to build an interactive system by combining the principles of pattern recognition along with the intelligent application of Kinect sensor. Involving Kinect sensor has served the purpose of collecting skeletal data, and after processing the same, the extracted relevant features have been fed to principal component analysis for dimensionality reduction phase. Finally, instead of using a single classifier for detection, in this chapter, an ensemble of k-nearest neighbor classifiers has been chosen since an ensemble algorithm is always likely to provide better results than a single classifier. To justify the efficacy of the designed framework it is implemented for interpretation of 20 distinct gestures, and in each case, it has generated better performances as compared to the other existing techniques.
Chapter Preview


Gesture is an effective non-verbal interactive modality which is embodied by users with an aim to communicate messages. To illustrate further, gestures are characterized as non-verbal body actions associated with the movements of head, hands or face with intent to control specific devices or to convey significant information to the surrounding environment. Gestures can be originated from any human body parts and are expressed by corresponding body part movements (Kendon, 2004).

With the extensive influx of modern day computing and communicative interfaces in the human society, now a day, technology is so deeply embedded in our everyday life that it is practically impossible to imagine any specific task without help of technology. It is quite obvious that with the rapid advancement of technology, the existing interfaces are likely to suffer from limitations in terms of speed and naturalness while exploiting the huge amount of available information. Moreover, the drawbacks of these traditionally used interactive platforms are becoming more and more pronounced with the invent of novel technologies like virtual reality (Burdea & Coiffet, 2003). To address these issues, researchers are investing their valuable resources towards formulating novel algorithms and techniques that facilitate an engaging and effective Human Computer Interaction (HCI) (Dix, 2009). In the present world, HCI is one of the most active research disciplines and it has generated tremendous motivation (Jaimes & Sebe, 2007) for study in the area of pattern recognition and computer vision (Moeslund & Granum, 2001).

A long term attempt of researchers is to migrate the human to human interactive modalities into HCI, gestures are one such non-verbal interactive means capable of expressing a wide range of actions starting from a simple body movement to the more complex ones. It is important to notice that while executing simple body movements the body parts always indicate general emotional states, whereas gestures can add specific linguistic content (Iverson & Goldin-Meadow, 2005) while enacting the same emotional state (Castellano, Kessous, & Caridakis, 2008). In this context, gestures are no longer an ornament of spoken languages rather it is an key element of the language generation process itself. Due to the precision and execution speed, gesture recognition is widely used in developing HCI interfaces and sign language recognition (Liang & Ouhyoung, 1998).

Over the last few decades researchers have proposed numerous techniques of gesture recognition but none of them entirely succeeded in building an efficient HCI interface which motivated us to explore further and come up with novel ideas that addresses the existing limitations (Alzubi, Nayyar, & Kumar, 2018). Osust and Wysocki (Oszust & Wysocki, 2013) have introduced a novel scheme for signed expression detection depending upon the video streams obtained using Microsoft Kinect sensor (Biao, Wensheng, & Songlin, 2013; Lai, Konrad, & Ishwar, 2012; Wang, Yang, Wu, Xu, & Li, 2012), in this case the authors have adopted two variants of time-series, the first one focusing upon the skeletal image of the body and the second one is primarily concerned with extraction of the shape and location of hands as skin colored areas. Malimaet al. (Malima, Özgür, & Çetin, 2006) has formulated another algorithm for recognizing a predetermined set of static postures which can be utilized for robot navigation purpose and after implementation of the said system it is reported that the concerned algorithm executes the command recognition operation at quite high speed and thus produces relatively lower runtime complexity. Liu et al. (Liu, Nakashima, Sako, & Fujisawa, 2003) attempted to develop a new text based input framework for handheld devices with camera embedded in it like video mobile phones etc. and after evaluating the performance of the Hidden Markov Model (HMM) (Rabiner & Juang, 1986) based training algorithms it has been inferred that the developers have attained a fairly good success rate. Lee and Kim have come up with a threshold model following the principles of HMM to address the issues of automatic gesture recognition which is recognized to be an ill-posed issue due to the presence of unpredictable and ambiguous non-gesture patterns (Lee & Kim, 1999).

Complete Chapter List

Search this Book: