Learning Algorithms for RBF Functions and Subspace Based Functions

Learning Algorithms for RBF Functions and Subspace Based Functions

Lei Xu (Chinese University of Hong Kong and Beijing University, PR China)
DOI: 10.4018/978-1-60566-766-9.ch003
OnDemand PDF Download:
List Price: $37.50


Among extensive studies on radial basis function (RBF), one stream consists of those on normalized RBF (NRBF) and extensions. Within a probability theoretic framework, NRBF networks relates to nonparametric studies for decades in the statistics literature, and then proceeds in the machine learning studies with further advances not only to mixture-of-experts and alternatives but also to subspace based functions (SBF) and temporal extensions. These studies are linked to theoretical results adopted from studies of nonparametric statistics, and further to a general statistical learning framework called Bayesian Ying Yang harmony learning, with a unified perspective that summarizes maximum likelihood (ML) learning with the EM algorithm, RPCL learning, and BYY learning with automatic model selection, as well as their extensions for temporal modeling. This chapter outlines these advances, with a unified elaboration of their corresponding algorithms, and a discussion on possible trends.
Chapter Preview


The renaissance of neural network and then machine learning since the 1980’s is featured by two streams of extensive studies, one on multilayer perceptron and the other on radial basis function networks or shortly RBF net. While a multilayer perceptron partitions data space by hyperplanes and then making subsequent processing via nonlinear transform, a RBF net partitions data space into modular regions via local structures, called radial or non-radial basis functions. After extensive studies on multilayer perceptron, studies have been turned to RBF net since the late 1980’s and the early 1990’s, with a wide range applications (Kardirkamanathan, Niranjan, & Fallside, 1991; Chang & Yang, 1997; Lee, 1999; Mai-Duy & Tran-Cong, 2001; Er, 2002; Reddy & Ganguli, 2003; Lin & Chen 2004; Isaksson, Wisell, & Ronnow, 2005; Sarimveis, Doganis, & Alexandridis, 2006; Guerra & Coelho, 2008; Karami & Mohammadi, 2008).

In the literature of machine learning, advances on RBF net can be roughly divided into two streams. One stems from the literature of mathematics on multivariate function interpolation and spline approximation, as well as Tikhonov type regularization for ill-posed problems, which were brought to neural networks learning by (Powell, 1987; Broomhead & Lowe, 1998; Poggio & Girosi, 1990; Yuille & Grzywacz, 1989) and others. Actually, it is shown that RBF net can be naturally derived from Tikhonov regularization theory, i.e. least square fitting subjected to a rotational and translational constraint term by a different stabilizing operator (Poggio & Girosi, 1990; Yuille & Grzywacz, 1989). Moreover, RBF net has also been shown to have not only the universal approximation ability possessed by multilayer perceptron (Hartman, 1989; Park & Sandberg,1993), but also the best approximation ability (Poggio & Girosi, 1990) and a good generalization property (Botros & Atkeson, 1991).

The other stream can be traced back to Parzen Window estimator proposed in the early 1960’s. Studies of this stream progress via active interactions between the literature of statistics and the literature of machine learning. During 1970’s-80’s, Parzen window has been widely used for estimating probability density, especially for estimating the class densities that are used for Bayesian decision based classification in the literature of pattern recognition. Also, it has been introduced into the literature of neural networks under the name of probabilistic neural net (Specht, 1990). By that time, it has not yet been related to RBF net since it does not directly relates to the above discussed function approximation purpose.

In the literature of neural networks and machine learning, Moody & Darken (1989) made a popular work via Gaussian bases as follows:


This normalized type of RBF net is featured by, (2) and thus usually called Normalized RBF (RBF). Moody & Darken (1989) actually considered a special case . Unknowns are learned from a set of samples via a two stage implementation. First, a clustering algorithm (e.g., k-means) is used to estimate ci as each cluster’s center. Second, is calculated for every sample and unknowns wj are estimated by a least square linear fitting. Nowlan (1990) proceeded along this line with Σj considered in its general form such that not only the receptive field of each base function can be elliptic instead of radial symmetrical, but also the well known EM algorithms (Redner & Walker, 1984) are adopted for estimating ci and Σj with better performances than that by a clustering algorithm.

Key Terms in this Chapter

Rival Penalized Competitive Learning (RPCL): It is a further development of competitive learning in help of an appropriate balance between participating and leaving mechanisms, such that an appropriate number k of individual substructures will be allocated to learn multiple structures underlying observations. With k initially at a value larger enough, the participating mechanism is featured by that a coming sample is allocated to one of the k substructures via competition, and the winner adapts this sample by a little bit, while the leaving mechanism is featured by that the rival is de-learned a little bit to reduce a duplicated allocation, which will discard extra substructures, with model selection made automatically during learning.

Model Selection and Two Stage Implementation: It refers to select an appropriate one among a family of infinite many candidate structures with each in a same configuration but in different scales, each of which is labeled by a scale parameter k in term of one integer or a set of integers. Selecting an appropriate k means getting a structure consisting of an appropriate number of free parameters. Usually, a maximum likelihood (ML) learning is not good for model selection. Classically, model selection is made in a two stage implementation. First, enumerate a candidate set K of k and estimate the unknown set of parameters by ML learning for a solution at each . Second, use a model selection criterion to select a best . Several criteria are available for the purpose, such as AIC, CAIC, BIC, cross validation, etc.

Bayesian Ying-Yang System: A set of samples and its inner representation R in an intelligent system are jointly considered by their joint distribution in two types of Bayesian decomposition. In a compliment to the famous ancient Ying-Yang philosophy, one decomposition coincides the Yang concept with a visible domain p(X) as a Yang space and a forward pathway by as a Yang pathway. Thus, p(X, R) is called Yang machine. Also, is called Ying machine with an invisible domain q(R) as a Ying space and a backward pathway by as a Ying pathway. Such a Ying-Yang pair is called Bayesian Ying-Yang system. The input to the Ying Yang system is through directly from a training sample set , while the inner representation describes both a long term memory T that is a collection of all unknown parameters in the system and a short term memory Y with each corresponding to one element . To build up an entire system, we need to design appropriate structures for each component. Specifically, the structure of is designed subject to the nature of learning tasks and a principle of least representation redundancy, the structure of is designed to suit the mapping under a principle of divide and conquer so that a complicated mapping is realized by a number of simple ones, while the structure of is designed for an inverse map under a principle of uncertainty conversation between Ying-Yang, i.e., Yang machine preserves a room or varying range that is appropriate to accommodate uncertainty or information contained in the Ying machine.

Automatic Model Selection: Being different from a usual incremental or decremental model selection that bases on evaluating the change as a subset ?new of parameters is added or removed, automatic model selection is associated with not only a learning algorithm or a learning principle but also an indicator on a subset If ?r consists of parameters of a redundant structural part, learning via either implementing this learning algorithm or optimizing this learning principle will drive and ?r towards a specific value, such that the corresponding redundant structural part is effectively removed. One example of such a learning algorithm is Rival Penalized Competitive Learning, while one example of such a learning principle is Bayesian Ying-Yang Harmony Learning.

Normalized Radial Basis Function (NRBF) and Extended NRBF: A radial basis function (RBF) is a linear combination of a series of simple function , with each called a base that is located at . Each base is classically radial symmetrical from its center , while it becomes unnecessary presently. A RBF is called normalized RBF (NRBF) if we have , and is further called Extended NRBF when each constant is extended into a function . Typically, we consider the cases that = is a linear function.

Mixtures-of-Experts (ME) and Alternative ME (AME): Initially, a mixture-of-experts is a weighted combination of a number of functions with each called expert that is weighted by that is called a gating net with each implemented by a three layer forward net. Generally, this weighted combination is actually the regression equation of a mixture of conditional distributions, i.e., All the unknowns are estimated via the maximum likelihood (ML) learning on by a generalized Expectation and Maximization (EM) algorithm. Moreover, alternative ME is a particular ME family with supported on a finite mixture and with its corresponding posteriori as the gating net . All the unknowns are estimated via the ML learning on by the EM algorithm.

Subspace Based Functions and Cascaded Extensions: Considering as generated from a subspace with independent factors distributed along each coordinate of an inner m dimensional representation we get basis functions or the gating net by based on different subspaces, resulting in subspace based extensions of ME and RBF networks. That is, we get as a combination of functions supported on subspace bases, which is thus called subspace based functions (SBF). Moreover, a direct mapping by can be replaced by a mapping as feature extraction or dimension reduction and then followed by another mapping , from which we get a cascade mapping. This cascaded extension will discard redundant parts with further improvements on performance.

Bayesian Ying Yang Learning: Named in a compliment to the famous ancient Chinese Ying-Yang philosophy, it refers to a general statistical learning framework that formularizes learning tasks in a two pathway featured intelligent system via two complementary Bayesian representations of the joint distribution on the external observation and its inner representation, with all unknowns in the system determined by a principle that two Bayesian representations become best harmony. This system is called Bayesian Ying Yang system, mathematically described by and . This best harmony is mathematically implemented by maximizing called Bayesian Ying Yang harmony learning. It follows from that this best Ying Yang harmony principle includes not only a best Ying Yang matching by minimizing the Ying Yang divergence but also minimizing the entropy of the Yang machine. In other words, a best Ying Yang harmony seeks a Yang machine as an inverse of Ying machine such that it best matches the Ying machine and also keeps a least complexity. Moreover, this best Ying Yang matching provides a general perspective that unifies a number of typical statistical learning approaches.

Complete Chapter List

Search this Book: