This chapter examines the history of artificial neural networks research through the present day. The components of artificial neural network architectures and both unsupervised and supervised learning methods are discussed. Although a step-by-step tutorial of how to develop artificial neural networks is not included, additional reading suggestions covering artificial neural network development are provided. The advantages and disadvantages of artificial neural networks for research and real-world applications are presented as well as potential solutions to many of the disadvantages. Future research directions for the field of artificial neural networks are presented.
TopIntroduction
Artificial neural networks (ANNs) is a subfield of machine learning within the research domain of artificial intelligence (see Artificial Intelligence, this volume). Research in developing ANNs started after McCulloch and Pitts (1943) proposed a mathematical model of neuronal activity in the brain and Hebb (1949) created a reinforcement based learning mechanism to explain learning in the human brain. Rosenblatt (1958) then created a computational model of brain processing elements called perceptrons and ANN research started in earnest. The goal of ANN research is to develop machine learning systems that are based on a biological model of the brain, specifically the bioelectrical activity of the neurons in the brain.
ANNs are a popular solution method in numerous domains including: business (Tkáč & Verner, 2016; Wong, Lai, & Lam, 2000), engineering (Ali et al., 2015; Bansal, 2006), and medicine (Reggia, 1993; Yardimci, 2009). Research and development with ANNs continues to be highly productive with the quantity of articles published in this subfield increasing annually. Using the search query artificial neural network on a university article database search produced 27,736 articles from 1985 to 2000 and 203,328 from 2001 to 2016 with over 51 percent of the publications appearing from 2011 to 2016; indicating a tenfold increase in ANN articles published over the same amount of time (16 years) and the trend continuing to accelerate.
It is important to understand the terminology used to discuss ANN architectures. A sample ANN architecture for a supervised learning multi-layer perceptron is shown in Figure 1. Modern ANNs are composed of:
- •
A layer of input elements also called the input vector, representing independent variables,
- •
Optionally, one or more hidden processing layers,
- •
Weighted connections between nodes in adjacent layers, and
- •
An output layer of one or more elements, representing the dependent variable(s).
Figure 1. Supervised learning ANN architecture (only a few connection weights shown)
Every processing element or neurode in a layer is connected to all processing elements in the next layer, with input neurodes connected to hidden layer neurodes etcetera, until the neurodes in the last hidden layer are connected to the output layer neurodes. These connections all carry a value, commonly called a weight, that is adjusted to permit learning. It is possible for an neurode to not be fully connected to the subsequent layer, but to be connected selectively to one or more neurodes in the following layer. Some ANN architectures also have weighted connections from a layer to not only the next layer, but also to one or more subsequent layers of neurodes.
What types of research problems are amenable to an ANN approach? Essentially, ANNs are intelligent pattern recognition machines. Thus, any problem which may be defined as a pattern recognition problem is suitable for ANN solutions. This includes all types of classification problems and also most prediction problems, such as time-series forecasting or medical diagnosis. Additional research has shown that ANNs may be used as a tool for evaluating medical or business decision making heuristics (Walczak, 2008).