Video Classification Using 3D Convolutional Neural Network

Video Classification Using 3D Convolutional Neural Network

K. Jairam Naik (National Institute of Technology, Raipur, India) and Annukriti Soni (National Institute of Technology, Raipur, India)
DOI: 10.4018/978-1-7998-2795-5.ch001


Since video includes both temporal and spatial features, it has become a fascinating classification problem. Each frame within a video holds important information called spatial information, as does the context of that frame relative to the frames before it in time called temporal information. Several methods have been invented for video classification, but each one is suffering from its own drawback. One of such method is called convolutional neural networks (CNN) model. It is a category of deep learning neural network model that can turn directly on the underdone inputs. However, such models are recently limited to handling two-dimensional inputs only. This chapter implements a three-dimensional convolutional neural networks (CNN) model for video classification to analyse the classification accuracy gained using the 3D CNN model. The 3D convolutional networks are preferred for video classification since they inherently apply convolutions in the 3D space.
Chapter Preview

Literature Review

For the classification of large scale videos using the Convolutional Neural Network, about 1 million sports videos were considered by [A. Karpathy et al, June 2014] from YouTube. That data was treated with the 2D convolutional neural network. The focus was to consider the large sports dataset for classification purposes, although the 2D CNN architecture considers only spatial information. So, in this work, it is introduced 3 fusion models by taking into account the time-based data in the network. The result provided in [A. Karpathy et al, June 2014, J. Huang, et al, June 2015] shows that CNN’s provide vigorous and performant features for impotently defined data. As given in the 3D CNN architecture which was presented by [S. Ji et al, 2013], this resulted in elongation of the dimensions of the filters in every layer. The trained model was additionally used to test other homogeneous datasets like UCF-101 and the so-called transfer learning experiment showed promising results albeit the dataset includes other activities. It was found the accuracy was 65% after applying 50 samples per class to train the model.

Complete Chapter List

Search this Book: