Understanding Convolutional Neural Network With TensorFlow: CNN

Understanding Convolutional Neural Network With TensorFlow: CNN

DOI: 10.4018/978-1-6684-8531-6.ch003
(Individual Chapters)
No Current Special Offers


In academia and business, deep-learning-based models have exhibited extraordinary performance over the last decade. The learning potential of Convolutional Neural Networks (CNNs) derives from a combination of several feature extraction levels that completely use a vast quantity of input. CNN is an important technique for tackling computer vision issues, although the theories behind its processing efficacy are not yet completely understood. CNN has achieved cutting-edge performance on a variety of datasets in computer vision applications like remote sensing, medical image categorization, facial detection, and object identification. This is due to the efficiency with which they process visual features. This chapter presents the most significant advancements in CNN for efficient processing in computer vision, including convolutional layer configurations, pooling layer approaches, network activation functions, loss functions, normalization approaches, and CNN optimization techniques.
Chapter Preview


The core of intelligence for computers and other electronic devices is machine learning. It use predictive models based on previous data to forecast future behaviors, outcomes, and patterns. Deep learning is an subfield of machine learning in which models brain - inspired are represented mathematically. Deep Neural Networks' (DNN) parameters, which may vary from a few hundred to over 1.2 billion, are dynamically learnt from the data. DNNs can describe complex nonlinear relationships between inputs and outputs. Their designs result in compositional models that depict the thing as a layered mixture of primitives. There are several variations of a few core approaches with deep structures. DNN utilizes hierarchical structures to learn high-level representations from data. It is a relatively recent technique that is commonly used in traditional applications of artificial intelligence, such as text categorization, learning techniques, natural-language processing, and machine vision. There are three key reasons for the rise in popularity of deep learning: vastly increased chip processing capabilities, decreased computer system prices, and substantial breakthroughs in machine learning techniques. As a consequence, DNNs have garnered a great deal of interest in recent years, and multiple models for diverse applications have been proposed.

Artificial Neural Networks (Guo et al., 2016) are computer processing systems that are substantially influenced by the operation of natural nervous systems. ANNs are primarily composed of many linked computational nodes (called neurons) that function dispersedly to train from inputs to maximize their output. Its input, often in the shape of multidimensional vectors, would have been loaded into the input layer and redistributed among the hidden units. This is known as the learning experience. The hidden units would then make judgments based on the last layer and evaluate how an unexpected change inside itself affects or enhances its correct outcome. Hidden layer levels built atop one another is referred to as deep learning. Supervised and unsupervised learning are the primary learning approaches in image analysis jobs. Supervised learning involves learning using inputs that have been pre-labeled and serve as objectives. Every training sample will have a set of input variables (feature vector) and one or more associated corresponding outputs. This kind of training aims to lower the overall classification error of a model by the proper computation of the actual output of training images.

In unsupervised, the trained model does not contain any labels. Typically, the success of a network is assessed by whether or not it can lower or raise an associated cost factor. However, it is essential to highlight that most image-based pattern recognition tasks rely on categorization utilizing supervised learning. Convolutional Neural Network is similar to conventional ANNs in that it consists of neurons that optimize themselves via learning. Each neuron will continue to receive inputs and execute an action (including a scalar product and then a nonlinear variable) - the fundamental building blocks of innumerable ANNs. The complete network will continue representing a single perceptual criterion (the weight) through input raw picture vectors to output class scores. The last layer will include loss functions connected with the classes, and every one of the standard techniques established for conventional ANNs will still be applicable.

CNN is one of the most essential and useful kinds of neural networks (Albawi et al., 2017) and is commonly used for categorization and object segmentation. The three major layers of a CNN are convolution, pooling, and fully connected. Each level is accountable for certain spatial tasks. CNN uses a variety of kernels in convolution layers to convolve the input image and create feature maps. The pooling layer often follows the convolution layer. This layer compresses both feature maps and system parameters. Following a layer that has been flattened and the layer that is pooling, there are a series of layers that are all interconnected. The flattened layer transforms the 2D feature maps of the previous layer into 1D feature maps suitable for the future fully connected levels. The vector that has been flattened may be used to categorize the photographs later. Pooling is a crucial step in convolutional networks since it reduces the size of the feature maps. By merging a collection of values into a lower number of values, it decreases the complexity of a feature map. It turns the composite visual features into information that may be used by conserving essential data and deleting extraneous data.

Key Terms in this Chapter

Tensor Flow: A free, open-source artificial intelligence and machine learning software library. It may be used for various applications, but training and inference of deep neural networks are its primary emphasis.

TensorBoard: TensorBoard is the platform for visualizing the graph and other capabilities required to comprehend, troubleshoot, and improve the models. It is a program that offers a machine-learning process with metrics and visuals. In addition, it assists in tracking parameters such as loss and accuracy, graph representation, application integration in lower-dimensional environments, etc.

Overfitting: An unfavorable machine learning characteristic happens when a model provides correct predictions for training examples but not for new data. Researchers first build the model on a collection of available information whenever data analysts make predictions using machine learning algorithms. Depending on this knowledge, the algorithm then attempts to predict results for additional data types. An overfit model might provide erroneous forecasts and need to function more effectively with all new data sources.

Underfitting: Underfitting occurs when a mathematical model or machine learning algorithm cannot reflect the fundamental patterns in data; it works well just on training examples but badly on testing data. Its recurrence merely indicates that our model or method does not adequately suit the data. It often occurs when there needs to be more data to develop an appropriate model and while attempting to construct a linear regression model with insufficient nonlinear data. In such situations, the machine learning model's rules are too simple and flexible also to be used for such little data; hence, the model will likely produce many incorrect predictions. Underfitting may be prevented by employing more data and limiting the number of characteristics via feature selection.

Pooling: Pooling merely refers to an image's down-sampling, in pooling layers to minimize the dimensionality of feature maps. Consequently, it decreases both the number of parameters that must be learned and the amount of computation done by the network. The pooling layer summarizes the characteristics contained in an area of the convolution layer-generated feature map.

Profiling: The objective of profiling a software program is to learn more about its behavior. By comprehending a program's behavior, engineers may carry out modifications that result in enhanced performance. Moreover, the programmer can identify the program's limitations by profiling a system.

Convolutional Neural Network: A Convolutional Neural Network is a Deep Learning method that can receive an image as an input, ascribe significance to different attributes in the picture, and distinguish between them. CNN requires far less pre-processing than other classification techniques. CNN can learn these characteristics with sufficient training, whereas filters in basic approaches are handcrafted.

Deep Learning: Deep Learning is a branch of artificial intelligence dealing with artificial neural network learning algorithms and brain function. These neural networks seek to imitate the activity of the human brain, though imperfectly, allowing them to “learn” from massive amounts of data.

Complete Chapter List

Search this Book: