Lightweight ConvNet Model for American Sign Language Hand Gesture Recognition

Lightweight ConvNet Model for American Sign Language Hand Gesture Recognition

Shamik Tiwari
Copyright: © 2022 |Pages: 19
DOI: 10.4018/978-1-7998-9434-6.ch009
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Deaf and hard-of-hearing persons practice sign language to converse with one other and with others in their community. Even though innovative and reachable technology is evolving to assist persons with hearing impairments, there is more scope of effort to be achieved. Computer vision applications with machine learning procedures could benefit such persons even more by allowing them to converse more effectively. That is precisely what this chapter attempts to do. The authors have suggested a MobileConvNet model that could recognise hand gestures in American Sign Language. MobileConvNet is a streamlined architecture that constructs lightweight deep convolutional neural networks using depthwise separable convolutions and provides an efficient model for mobile and embedded vision applications. The difficulties and limitations of sign language recognition are also discussed. Overall, it is intended that the chapter will give readers a thorough overview of the topic of sign language recognition as well as aid future research in this area.
Chapter Preview
Top

Introduction

Day by day, computing gadgets are becoming an increasingly important aspect of our life. As the need for such computing devices grew, so did the need for simple and effective computer interfaces. As a result, systems that use vision-based interaction and control are becoming more widespread, and as a result, gesture recognition is becoming increasingly popular in the research community due to a variety of reasons (Ghanem et al., 2017). Hand gestures are a type of body language that can be communicated by the position of the fingers, the centre of the palm, and the shape formed by the hand. Static and dynamic hand movements can be distinguished. The static gesture refers to the hand's fixed shape, whereas the dynamic gesture is made up of a sequence of hand movements such as waving. Gesture interaction is a well-known technology that can be utilized in a variety of applications, containing sign language translation, sports, human-robot interaction, and human-machine interaction in general. Hand-gesture recognition systems are also used in medical applications, where bioelectrical signals are used instead of eyesight to identify gestures. Gestures can be categorized broadly into the following groups (Banjarey et al., 2021; Tiwari, 2018a).

  • Head and Face Gestures: moving the head, eye movement direction, lifting the eyebrows, blinking, curling the nostrils, raising the mouth to speak, smiles, delight, contempt, panic, hate, grief, disdain etc. are examples of head and face gestures.

  • Hand and Arm Gestures: Recognition of hand positions, sign languages, and entertainment and gaming applications are all possible using hand and arm gestures.

  • Body Gestures: Full-body motion is involved in body gestures, such as following the motions of two individuals engaging together, evaluating a dancer's movements to generate corresponding music and graphics, and detecting human postures for medical rehabilitation and physical education (Challa et al., 2021).

There are two sorts of sensors applied to recognise hand gestures referred as contact sensors and non-contact sensors. Contact approaches analyse the signal obtained from contact sensors bound to the wrist or arm, to identify gestures (Bantupalli & Xie, 2018; Dua et al., 2021). Contact-type devices take longer time to measure since they must touch and then traverse the item. They have a superior identification range than non-contact systems, as they are not restricted by range or sensor sight, and they can obtain relatively precise information owing to direct touch. According to numerous research, non-contact methods are mostly established with machine vision equipment such as leap motion controller, camera sensors, Kinect. These sensors do not attached to the human body. Non-contact sensors are much less susceptible to sensor wear and will not diminish a target's motion. The rest of the chapter is divided into literature review, material and methods, experiment and results, and conclusion respectively in sections 2, 3, 4 and 5.

Top

Literature Review

This section analyses the available literature on gesture recognition systems for HCI by classifying it according to certain essential features. It also examines the advancements that are required to improve current hand gesture detection systems in order for them to be extensively employed for optimal HCI in the future.

To assist conversation among signers and non-signers, Bantupalli et al. (Bantupalli & Xie, 2018) have designed a machine vision system that provides sign language interpretation to text. This suggested application extracts temporal and spatial characteristics from video sequences. Then, for identifying spatial features, utilise Inception, a CNN model. Then, to train on temporal characteristics, utilise an RNN. The American Sign Language Dataset was used in this study. Garcia and Viesca (Garcia & Viesca, 2016) have demonstrated the creation and deployment of a CNN-based American Sign Language fingerspelling translator. To apply transfer learning, they use a pre-trained GoogLeNet structure trained on the ILSVRC2012 dataset as well as the Surrey University and Massey University ASL datasets. They have developed a robust model that correctly recognises letters a-e in the majority of instances and accurately identifies letters a-k in the majority of instances with first-time users.

Key Terms in this Chapter

Deep Learning: Deep learning is a sort of machine learning and artificial intelligence that mimics how humans acquire knowledge. Data science, which covers statistics and predictive modelling, incorporates deep learning as a key component.

MobileNet: MobileNet is a refined deep neural network architecture that constructs lightweight CNN model using depthwise separable convolutions and provides an effective model for mobile and embedded application areas.

Gesture: A gesture is a visual representation of physical action or emotional expression. It consists of both body and hand gestures.

ConvNet: CNN or ConvNet is a type of deep neural network that is frequently used to evaluate visual imagery.

Transfer Learning: Transfer learning is the concept of breaking free from the isolated learning paradigm and applying what you've learned to solve related problems.

Sign Language: When spoken communication is not possible, sign language is a method of communicating through body motions, particularly those of the hands and arms.

Complete Chapter List

Search this Book:
Reset