Sequence processing involves several tasks such as clustering, classification, prediction, and transduction of sequential data which can be symbolic, non-symbolic or mixed. Examples of symbolic data patterns occur in modelling natural (human) language, while the prediction of water level of River Thames is an example of processing non-symbolic data. If the content of a sequence will be varying through different time steps, the sequence is called temporal or time-series. In general, a temporal sequence consists of nominal symbols from a particular alphabet, while a time-series sequence deals with continuous, real-valued elements (Antunes & Oliverira, 2001). Processing both these sequences mainly consists of applying the current known patterns to produce or predict the future ones, while a major difficulty is that the range of data dependencies is usually unknown. Therefore, an intelligent system with memorising capability is crucial for effective sequence processing and modelling. A recurrent neural network (RNN) is an artificial neural network in which self-loop and backward connections between nodes are allowed (Lin & Lee 1996; Schalkoff, 1997). Comparing to feedforward neural networks, RNNs are well-known for their power to memorise time dependencies and model nonlinear systems. RNNs can be trained from examples to map input sequences to output sequences and in principle they can implement any kind of sequential behaviour. They are biologically more plausible and computationally more powerful than other modelling approaches, such as Hidden Markov Models (HMMs), which have non-continuous internal states, feedforward neural networks and Support Vector Machines (SVMs), which do not have internal states at all. In this article, we review RNN architectures and we discuss the challenges involved in training RNNs for sequence processing. We provide a review of learning algorithms for RNNs and discuss future trends in this area.
Architectures Of Recurrent Networks
In the literature, several classification schemes have been proposed to organise RNN architectures starting from different principles for the classification, i.e. some consider the loops of nodes in the hidden layers, while others take the types of output into account. For example, they can be organised into canonical RNNs and dynamic MLPs (Tsoi, 1998a); autonomous converging and non-autonomous non-converging (Bengio et al., 1993); locally (receiving feedback(s) from the same or directly connected layer), output feedback, and fully connected (i.e. all nodes are capable to receive and transfer feedback signals to the other nodes, even within different layers) RNNs (dos Santos & Zuben, 2000); binary and analog RNNs (Orponen, 2000).
Key Terms in this Chapter
Real-Time Recurrent Learning: A general approach to training an arbitrary recurrent network by adjusting weights along the error gradient. This algorithm usually requires very low learning rates because of the inherent correlations between successive node outputs.
Backpropagation through Time: An algorithm for recurrent neural networks that uses the gradient descent method. It attempts to train a recurrent neural network by unfolding it into a multilayer feedforward network that grows by one layer for each time step, also called unfolding of time.
Artificial Neural Network: A network of many simple processors, called “units” or “neurons”, which provides a simplified model of a biological neural network. The neurons are connected by links that carry numeric values corresponding to weightings and are usually organised in layers. Neural networks can be trained to find nonlinear relationships in data, and are used in applications such as robotics, speech recognition, signal processing or medical diagnosis.
Extended Kalman Filter: An online learning algorithm for determining the weights in a recurrent network given target outputs as it runs. It is based on the idea of Kalman filtering, which is a well-known linear recursive technique for estimating the state vector of a linear system from a set of noisy measurements.
Sequence Processing: A sequence is an ordered list of objects, events or data items. Processing of a sequence may involve one or a number of operations, such as classification of the whole sequence into a category; transformation of a sequence into another one; prediction or continuation of a sequence; generation of an output sequence from a single input.
Gradient Descent: A popular training algorithm that minimises the total squared error of the output computer by a neural network. To find a local minimum of the error function using gradient descent, one takes steps proportional to the negative of the gradient (or the approximate gradient) of the function at the current point.
Recurrent Neural Network: An artificial neural network with feedback connections. This is in contrast to what happens in a feedforward neural network, where the signal simply passes from the input neurons, through the hidden neurons, to the outputs nodes
Neural Architecture: Particular organisation of artificial neurons and connections between them in an artificial neural network.
Training Algorithm: A step-by-step procedure for adjusting the connection weights of an artificial neural network. In supervised training, the desired (correct) output for each input vector of a training set is presented to the network, and many iterations through the training data may be required to adjust the weights. In unsupervised training, the weights are adjusted without specifying the correct output for any of the input vectors.