Preventing Model Overfitting and Underfitting in Convolutional Neural Networks

Preventing Model Overfitting and Underfitting in Convolutional Neural Networks

Andrei Dmitri Gavrilov (University of British Columbia, Calgary, Canada), Alex Jordache (University of British Columbia, Calgary, Canada), Maya Vasdani (University of British Columbia, Calgary, Canada) and Jack Deng (University of British Columbia, Calgary, Canada)
DOI: 10.4018/IJSSCI.2018100102

Abstract

The current discourse in the machine learning domain converges to the agreement that machine learning methods emerged as some of the most prominent learning and classification approaches over the past decade. The CNN became one of most actively researched and broadly-applied deep machine learning methods. However, the training set has a large influence on the accuracy of a network and it is paramount to create an architecture that supports its maximum training and recognition performance. The problem considered in this article is how to prevent overfitting and underfitting. The deficiencies are addressed by comparing the statistics of CNN image recognition algorithms to the Ising model. Using a two-dimensional square-lattice array, the impact that the learning rate and regularization rate parameters have on the adaptability of CNNs for image classification are evaluated. The obtained results contribute to a better theoretical understanding of a CNN and provide concrete guidance on preventing model overfitting and underfitting when a CNN is applied for image recognition tasks.
Article Preview
Top

2. Background Research

Cognitive computing and cognitive architectures recently emerged as powerful tools to tackle complex large-scale real-life problems in the presence of uncertainty and variable data quality (Tian et al. 2012), (Wang et al., 2013), (Wang et al., 2016). Popular approaches that assist in building cognitive models, which can simulate human thought process, include deep machine learning methods, artificial neural networks (ANN), convolution neural networks (CNN), neuro-linguistic programming (NLP) and sentiment analysis. They have been successfully applied to various intelligent systems in the fields of computer graphics, robotics, knowledge representation, virtual reality, situation awareness, decision-support systems, medicine and many other areas (Wang et.al., 2017), (Gavrilova et al., 2017), (Montero-Obasso et al., 2012). One of the fastest growing domains where notable progress has been made using cognitive, fuzzy and multi-modal architectures is biometric security and image processing (Browne & Ghidary, 2003), (Han & Bhanu, 2006), (Monwar et al., 2011), (Yuan et al., 2008). Over past couple of years, there has been a significant surge in adapting machine-learning methods for image recognition. The introduction of CNN created excitement in image processing research community, with new opportunities to significantly increase image identification rate with a fraction of computational resources, thus making the recognition process more accurate and less resource demanding.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 12: 4 Issues (2020): 2 Released, 2 Forthcoming
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing