Convolutional neural network (CNN) carries spatial information—not all nodes in one layer are fully connected to nodes in the next layer; weights are shared. The main goal of CNN is to process large image pixel matrix and try to reduce high matrix dimensions without losing information, and to simplify the network architecture with weight sharing, reducing the number of trainable parameters in the network, which helped the model to avoid overfitting and as well as to improved generalization and still to give high performance with desired accuracy. So, CNN has become dominant in various computer vision tasks and is attracting interest across a variety of domains involving image processing. This chapter focuses on the foundation of CNN, followed by architecture of CNN, activation functions, applications, and recent trends in CNN.
Top2. Foundations Of Convolution Neural Network
In 1959, neurophysiologists David Hubel and Torsten Wiesel worked intensively on cat’s cortex structure, performed lots of experiment and then published their research work in paper, entitled “Receptive fields of single neurons in cat’s striate cor- tex”(Hubel & Wiesel, 1968), described that the neurons inside the brain of a cat are organized in layered form. These layers learn how to recognize visual patterns by first extracting the local features and then combining the extracted features for higher level representation. Later on, this concept is essentially become one of the core principle of Deep Learning.
In 1980 Kunihiko Fukushima proposed Neocognitron (Fukushima, 1980), a self-organizing Neural Network, with multiple layers for recognizing visual patterns in hierarchical manner through learning and this architecture became the first theoretical model of CNN as in the Figure 1.
A further major improvement over the architecture of Neocognitron was done by LeCun et. in 1989 by developing a modern framework of CNN, called LeNet-5, a digit recognizer for MNIST handwritten digits dataset. LeNet-5 recognized visual patterns directly from raw input images without separated feature extraction. It was trained using error back-propagation algorithm.
After discovering LeNet-5, because of several limitation like lack of large training data, lack of innovation in algorithm and inadequate computing power, CNN did not perform well in various complex problems. But nowadays, in the era of Big Data we have large labeled datasets, more innovative algorithms and especially powerful GPU machines. With these type of upgradation, in 2012, Krizhevsky et al. designed AexNet, which achieved a fantastic accuracy on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC(Russakovsky et al., 2015)). The achievement of AlexNet guided the way to invent several CNN models (Alzubaidi et al., 2021) as well as to apply those models in different field of computer vision and natural language processing.
Figure 1. Schematic diagram illustrating the interconnections between layers in the neocognitron, Kunihiko Fukushima (1980)
Top3. Fundamentals Of Convolution Neural Network
Convolutional Neural Network (CNN), also called ConvNet, is a type of Artificial Neural Net- work(ANN), with deep feed-forward architecture. It is trained to learn highly abstracted features of objects especially for spatial data.
A deep CNN model have a finite set of hidden processing layers that can learn various features of input data (e.g. image). The initiatory layers learn and extract the high level features (with lower abstraction), and the deeper layers learns and extracts the low level features (with higher abstraction). The basic CNN model is shown in Figure 2.