Feed-Forward Artificial Neural Network Basics

Feed-Forward Artificial Neural Network Basics

Lluís A. Belanche Muñoz (Universitat Politècnica de Catalunya, Spain)
Copyright: © 2009 |Pages: 8
DOI: 10.4018/978-1-59904-849-9.ch097
OnDemand PDF Download:
$37.50

Abstract

The class of adaptive systems known as Artificial Neural Networks (ANN) was motivated by the amazing parallel processing capabilities of biological brains (especially the human brain). The main driving force was to re-create these abilities by constructing artificial models of the biological neuron. The power of biological neural structures stems from the enormous number of highly interconnected simple units. The simplicity comes from the fact that, once the complex electro-chemical processes are abstracted, the resulting computation turns out to be conceptually very simple. These artificial neurons have nowadays little in common with their biological counterpart in the ANN paradigm. Rather, they are primarily used as computational devices, clearly intended to problem solving: optimization, function approximation, classification, time-series prediction and others. In practice few elements are connected and their connectivity is low. This chapter is focused to supervised feed-forward networks. The field has become so vast that a complete and clearcut description of all the approaches is an enormous undertaking; we refer the reader to (Fiesler & Beale, 1997) for a comprehensive exposition.
Chapter Preview
Top

Introduction

The answer to the theoretical question: “Can a machine be built capable of doing what the brain does?” is yes, provided you specify in a finite and unambiguous way what the brain does. -- Warren S. McCulloch

The class of adaptive systems known as Artificial Neural Networks (ANN) was motivated by the amazing parallel processing capabilities of biological brains (especially the human brain). The main driving force was to re-create these abilities by constructing artificial models of the biological neuron. The power of biological neural structures stems from the enormous number of highly interconnected simple units. The simplicity comes from the fact that, once the complex electro-chemical processes are abstracted, the resulting computation turns out to be conceptually very simple.

These artificial neurons have nowadays little in common with their biological counterpart in the ANN paradigm. Rather, they are primarily used as computational devices, clearly intended to problem solving: optimization, function approximation, classification, time-series prediction and others. In practice few elements are connected and their connectivity is low. This chapter is focused to supervised feed-forward networks. The field has become so vast that a complete and clear-cut description of all the approaches is an enormous undertaking; we refer the reader to (Fiesler & Beale, 1997) for a comprehensive exposition.

Top

Background

Artificial Neural Networks (Bishop, 1995), (Haykin, 1994), (Hertz, Krogh & Palmer, 1991), (Hecht-Nielsen, 1990) are information processing structures without global or shared memory, where each of the computing elements operates only when all its incoming information is available, a kind of data-flow architectures. Each element is a simple processor with internal and adjustable parameters. The interest in ANN is primarily related to the finding of satisfactory solutions for problems cast as function approximation tasks and for which there is scarce or null knowledge about the process itself, but a (limited) access to examples of response. They have been widely and most fruitfully used in a variety of applications—see (Fiesler & Beale, 1997) for a comprehensive review—especially after the boosting works of (Hopfield, 1982), (Rumelhart, Hinton & Williams, 1986) and (Fukushima, 1980).

The most general form for an ANN is a labelled directed graph, where each of the nodes (called units or neurons) has a certain computing ability and is connected to and from other nodes in the network via labelled edges. The edge label is a real number expressing the strength with which the two involved units are connected. These labels are called weights. The architecture of a network refers to the number of units, their arrangement and connectivity.

In its basic form, the computation of a unit i is expressed as a function Fi of its input (the transfer function), parameterized with its weight vector or local information. The whole system is thus a collection of interconnected elements, and the transfer function performed by a single one (i.e., the neuron model) is the most important fixed characteristic of the system.

There are two basic types of neuron models in the literature used in practice. Both express the overall computation of the unit as the composition of two functions, as is classically done since the earlier model proposal of McCulloch & Pitts (1943):Fi(x ) = {g(h(x,wi)), wiRn},xRn(1) where wi is the weight vector of neuron i, h:Rn×RnR is called the net input or aggregation function, and g:RR is called the activation function. All neuron parameters are included in its weight vector.

Key Terms in this Chapter

Learning Algorithm: Method or algorithm by virtue of which an Artificial Neural Network develops a representation of the information present in the learning examples, by modification of the weights.

Neuron Model: The computation of an artificial neuron, expressed as a function of its input and its weight vector and other local information.

Artificial Neural Network: Information processing structure without global or shared memory that takes the form of a directed graph where each of the computing elements (“neurons”) is a simple processor with internal and adjustable parameters, that operates only when all its incoming information is available.

Weight: A free parameter of an Artificial Neural Network, that can be modified through the action of a Learning Algorithm to obtain desired responses to input stimuli.

Feed-Forward Artificial Neural Network: Artificial Neural Network whose graph has no cycles.

Bias-Variance Tradeoff: The mean square error (to be minimized) decomposes in a sum of two non-negative terms, the squared bias and the variance. When an estimator is modified so that one term decreases, the other term will typically increase.

Architecture: The number of artificial neurons, its arrangement and connectivity.

Complete Chapter List

Search this Book:
Reset