Human Action Recognition Based on YOLOv7

Human Action Recognition Based on YOLOv7

DOI: 10.4018/979-8-3693-1738-9.ch006
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Human action recognition is a fundamental research problem in computer vision. The accuracy of human action recognition has important applications. In this book chapter, the authors use a YOLOv7-based model for human action recognition. To evaluate the performance of the model, the action recognition results of YOLOv7 were compared with those using CNN+LSTM, YOLOv5, and YOLOv4. Furthermore, a small human action dataset suitable for YOLO model training is designed. This data set is composed of images extracted from KTH, Weizmann, MSR data sets. In this book chapter, the authors make use of this data set to verify the experimental results. The final experimental results show that using the YOLOv7 model for human action recognition is very convenient and effective, compared with the previous YOLO model.
Chapter Preview
Top

Introduction

Normally, surveillance videos usually contain a series of actions (Yan, 2019). Recognizing the actions in these videos can provide huge benefits, such as recognizing the person who fell in time and assisting him to avoid follow-up problems from the fall down. Therefore, it is very necessary to evaluate or analyse human action in videos. Human action recognition generally refers to judging or analysing the classes of human actions in videos (Soomro et al., 2014). Concisely, it is to correctly classify human actions into known action classes.

While recognizing these actions, it also brings a huge workload. Therefore, a rapid and efficient action recognition method becomes very important. The relevant methods in deep learning can meet the requirements and solve this problem. As a machine learning method, deep learning has been widely employed since it was proposed (Yan, 2021). The purpose of this approach is to allow computers to be trained, with the ability to analyse and identify specific data (Gao et al., 2021).

Human action recognition has been a topic of interest within academic discourse. In the past, a substantial body of research on the recognition of human actions has utilized traditional machine-learning techniques, such as the extraction of visual characteristics or motion trajectories. Now, deep learning methods are more widely utilized. The deep learning technique is prevalent not only within computer vision but also across fields of study, including NLP (Natural Language Processing) etc (Wiriyathammabhum et al., 2016). As more and more researchers use deep learning methods to recognize actions, the recognition efficiency improves over time. Currently, researchers have proposed several recognition algorithms, including CNN (Khan et al., 2020), Two-Stream (Simonyan et al., 2014), C3D (Convolution 3 Dimension) (Tran et al., 2015), and RNN (Du et al., 2017), etc.

Similar to the Convolutional Neural Network (CNN), the You Only Look Once (YOLO) model has an input layer, convolutional layer, pooling layer, and fully connected layer. The aforementioned study conducted by Redmon et al. (2016) establishes the fundamental framework for a comprehensive Convolutional Neural Network (CNN) architecture. However, YOLO exhibits a clear differentiation from the conventional CNN model. The achievement of end-to-end object detection necessitates the use of a distinct CNN model. This enhancement results in improved computational efficiency of the YOLO model. This is one of the reasons why this study selects the YOLOv7 model for human action recognition.

This study employs the YOLOv7 framework to construct a comprehensive network for human action recognition. The YOLO algorithm, which stands for “You Only Look Once,” is a visual object identification method that utilizes a convolutional neural network. This approach was first introduced in 2016 by Redmon et al. One of the key benefits of this particular approach is in its inherent simplicity and efficiency, which allows for swift execution. According to Cao et al. (2023), the YOLOv7 model exhibits notable advancements in terms of both running speed and structure. This research effort primarily focuses on the investigation of fundamental human actions. This study contributes to the existing body of knowledge in this area by evaluating whether using the YOLOv7 model is effective in human action recognition.

Complete Chapter List

Search this Book:
Reset