Multilogistic Regression by Product Units

Multilogistic Regression by Product Units

P. A. Gutiérrez, C. Hervás, F. J. Martínez-Estudillo, M. Carbonero
Copyright: © 2009 |Pages: 9
DOI: 10.4018/978-1-59904-849-9.ch166
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Multi-class pattern recognition has a wide range of applications including handwritten digit recognition (Chiang, 1998), speech tagging and recognition (Athanaselis, Bakamidis, Dologlou, Cowie, Douglas-Cowie & Cox, 2005), bioinformatics (Mahony, Benos, Smith & Golden, 2006) and text categorization (Massey, 2003). This chapter presents a comprehensive and competitive study in multi-class neural learning which combines different elements, such as multilogistic regression, neural networks and evolutionary algorithms. The Logistic Regression model (LR) has been widely used in statistics for many years and has recently been the object of extensive study in the machine learning community. Although logistic regression is a simple and useful procedure, it poses problems when is applied to a real-problem of classification, where frequently we cannot make the stringent assumption of additive and purely linear effects of the covariates. A technique to overcome these difficulties is to augment/replace the input vector with new variables, basis functions, which are transformations of the input variables, and then to use linear models in this new space of derived input features. Methods like sigmoidal feed-forward neural networks (Bishop, 1995), generalized additive models (Hastie & Tibshirani, 1990), and PolyMARS (Kooperberg, Bose & Stone, 1997), which is a hybrid of Multivariate Adaptive Regression Splines (MARS) (Friedman, 1991) specifically designed to handle classification problems, can all be seen as different nonlinear basis function models. The major drawback of these approaches is stating the typology and the optimal number of the corresponding basis functions. Logistic regression models are usually fit by maximum likelihood, where the Newton-Raphson algorithm is the traditional way to estimate the maximum likelihood a-posteriori parameters. Typically, the algorithm converges, since the log-likelihood is concave. It is important to point out that the computation of the Newton-Raphson algorithm becomes prohibitive when the number of variables is large. Product Unit Neural Networks, PUNN, introduced by Durbin and Rumelhart (Durbin & Rumelhart, 1989), are an alternative to standard sigmoidal neural networks and are based on multiplicative nodes instead of additive ones.
Chapter Preview
Top

Background

In the classification problem, measurements xi, i = 1,2,...,k, are taken on a single individual (or object), and the individuals are to be classified into one of J classes on the basis of these measurements. It is assumed that J is finite, and the measurements xi are random observations from these classes. A training sample D = {(xn, yn); n = 1, 2,...,N} is available, where xn = (x1n,...,xkn) is the vector of measurements taking values in, and yn is the class level of the nth individual. In this chapter, we will adopt the common technique of representing the class levels using a “1-of-J” encoding vector y = (y(1), y(2),...,y(J)), such as y(l) = 1 if x corresponds to an example belonging to class l and y(l) = 0 otherwise. Based on the training sample, we wish to find a decision function C: Ω → {1,2,...,J} for classifying the individuals. In other words, C provides a partition, say D1,D2,...,DJ, of Ω, where Dl corresponds to the lth class, l = 1,2,...,J, and measurements belonging to Dl will be classified as coming from the lth class. A misclassification occurs when a decision rule C assigns an individual (based on measurements vector) to a class j when it is actually coming from a class l ¹ j.

Key Terms in this Chapter

Evolutionary Computation: Computation based on iterative progress, such as growth or development in a population. This population is selected in a guided random search using parallel processing to achieve the desired solution. Such processes are often inspired by biological mechanisms of evolution.

Iteratively Reweighted Least Squares (IRLS): Numerical algorithm for minimizing any specified objective function using a standard weighted least squares method such as Gaussian elimination. It is widely applied in Logistic Regression.

Artificial Neural Networks: A network of many simple processors (“units” or “neurons”) that imitates a biological neural network. The units are connected by unidirectional communication channels, which carry numeric data. Neural networks can be trained to find nonlinear relationships in data, and are used in applications such as robotics, speech recognition, signal processing or medical diagnosis.

Logistic Regression: Statistical regression model for Bernoulli-distributed dependent variables. It is a generalized linear model that uses the logit as its link function. Logistic regression applies maximum likelihood estimation after transforming the dependent into a logit variable (the natural log of the odds of the dependent occurring or not).

Evolutionary Programming: One of the four major evolutionary algorithm paradigms, with no fixed structure or representation, in contrast with some of the other evolutionary paradigm. Its main variation operator is the mutation.

Product Unit Neural Networks: Alternative to standard sigmoidal neural networks, based on multiplicative nodes instead of additive ones. Concretely, the output of each hidden node is the product of all its inputs raised to a real exponent.

Remote Sensing: Short or large-scale acquisition of information of an object or phenomenon, by the use of either recording or real-time sensing devices that is not in physical or intimate contact with the object (such as by way of aircraft, spacecraft, satellite, or ship).

Precision Farming: Use of new technologies, such as global positioning (GPS), sensors, satellites or aerial images, and information management tools (GIS) to assess and understand in-field variability in agriculture. Collected information may be used to more precisely evaluate optimum sowing density, estimate fertilizers and other inputs needs, and to more accurately predict crop yields.

Complete Chapter List

Search this Book:
Reset