Feedforward neural networks (FFNs) are often considered as universal tools and find their applications in areas such as function approximation, pattern recognition, or signal and image processing. One of the main advantages of using FFNs is that they usually do not require, in the learning process, exact mathematical knowledge about input-output dependencies. In other words, they may be regarded as model-free approximators (Hornik, 1989). They learn by minimizing some kind of an error function to fit training data as close as possible. Such learning scheme doesn’t take into account a quality of the training data, so its performance depends strongly on the fact whether the assumption, that the data are reliable and trustable, is hold. This is why when the data are corrupted by the large noise, or when outliers and gross errors appear, the network builds a model that can be very inaccurate. In most real-world cases the assumption that errors are normal and iid, simply doesn’t hold. The data obtained from the environment are very often affected by noise of unknown form or outliers, suspected to be gross errors. The quantity of outliers in routine data ranges from 1 to 10% (Hampel, 1986). They usually appear in data sets during obtaining the information and pre-processing them when, for instance, measurement errors, long-tailed noise, or results of human mistakes may occur. Intuitively we can define an outlier as an observation that significantly deviates from the bulk of data. Nevertheless, this definition doesn’t help in classifying an outlier as a gross error or a meaningful and important observation. To deal with the problem of outliers a separate branch of statistics, called robust statistics (Hampel, 1986, Huber, 1981), was developed. Robust statistical methods are designed to act well when the true underlying model deviates from the assumed parametric model. Ideally, they should be efficient and reliable for the observations that are very close to the assumed model and simultaneously for the observations containing larger deviations and outliers. The other way is to detect and remove outliers before the beginning of the model building process. Such methods are more universal but they do not take into account the specific type of modeling philosophy (e.g. modeling by the FFNs). In this article we propose new robust FFNs learning algorithm based on the least trimmed squares estimator.
The most popular FFNs learning scheme makes use of the backpropagation (BP) strategy and a minimization of the mean squared error (mse). Until now, a couple various robust BP learning algorithms have been proposed. Generally, they take advantage of the idea of robust estimators. This approach was adopted to the neural networks learning algorithms by replacing the mse with a loss error function of such a shape that the impact of outliers may be, in certain conditions, reduced or even removed.
Chen and Jain (1994) proposed the Hampel’s hyperbolic tangent as a new error criterion, with the scale estimator β that defines the interval supposed to contain only clean data, depending on the assumed quantity of outliers or current errors values. This idea was combined with the annealing concept by Chunag and Su (2000). They applied the annealing scheme to decrease the value of β, whereas Liano (1996) introduced the logistic error function derived from the assumption of the errors generated with the Cauchy distribution. In a recent work Pernia-Espinoza et al. (2005) presented an error function based on tau-estimates. An approach based on the adaptive learning rate was also proposed (Rusiecki, 2006). Such modifications may significantly improve the network performance for corrupted training sets. However, even these approaches suffer from several difficulties and cannot be considered as universal (also because of properties of applied estimators). Besides, very few of them have been proposed until today and they exploit the same basic idea, so we still need to look for new solutions.
Key Terms in this Chapter
Gross Errors: Large value errors, often caused by human mistakes, measurement errors, etc.
Outlier: Observation that is significantly different from majority of data
Leverage Points: Grossly aberrant values of measured or assumed system inputs
Feedforward Neural Networks: Artificial NN consisting of units arranged in layers with only forward connections to units in subsequent layers.
Robust Statistics: Part of statistics developing methods that should give useful results when certain assumptions (for example of iid light tailed errors) are relaxed
Robust Estimator: Estimator able to classify data into outliers and clean observations, and to find a reasonable fit to the bulk of data.
Robust Learning Algorithm: NN learning algorithm that can act well even if outliers or leverage points are present in training sets