An Introduction to Kernel Methods

An Introduction to Kernel Methods

Gustavo Camps-Valls (Universitat de València, Spain), Manel Martínez-Ramón (Universidad Carlos III de Madrid, Spain) and José Luis Rojo-Álvarez (Universidad Rey Juan Carlos, Spain)
Copyright: © 2009 |Pages: 5
DOI: 10.4018/978-1-60566-010-3.ch170
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Machine learning has experienced a great advance in the eighties and nineties due to the active research in artificial neural networks and adaptive systems. These tools have demonstrated good results in many real applications, since neither a priori knowledge about the distribution of the available data nor the relationships among the independent variables should be necessarily assumed. Overfitting due to reduced training data sets is controlled by means of a regularized functional which minimizes the complexity of the machine. Working with high dimensional input spaces is no longer a problem thanks to the use of kernel methods. Such methods also provide us with new ways to interpret the classification or estimation results. Kernel methods are emerging and innovative techniques that are based on first mapping the data from the original input feature space to a kernel feature space of higher dimensionality, and then solving a linear problem in that space. These methods allow us to geometrically design (and interpret) learning algorithms in the kernel space (which is nonlinearly related to the input space), thus combining statistics and geometry in an effective way. This theoretical elegance is also matched by their practical performance.
Chapter Preview
Top

Introduction

Machine learning has experienced a great advance in the eighties and nineties due to the active research in artificial neural networks and adaptive systems. These tools have demonstrated good results in many real applications, since neither a priori knowledge about the distribution of the available data nor the relationships among the independent variables should be necessarily assumed. Overfitting due to reduced training data sets is controlled by means of a regularized functional which minimizes the complexity of the machine. Working with high dimensional input spaces is no longer a problem thanks to the use of kernel methods. Such methods also provide us with new ways to interpret the classification or estimation results. Kernel methods are emerging and innovative techniques that are based on first mapping the data from the original input feature space to a kernel feature space of higher dimensionality, and then solving a linear problem in that space. These methods allow us to geometrically design (and interpret) learning algorithms in the kernel space (which is nonlinearly related to the input space), thus combining statistics and geometry in an effective way. This theoretical elegance is also matched by their practical performance.

Although kernels methods have been considered from a long time ago in pattern recognition from a theoretical point of view (see, e.g., Capon, 1965), a number of powerful kernel-based learning methods emerged in the last decade. Significant examples are Support Vector Machines (SVMs) (Vapnik, 1998), Kernel Fisher Discriminant (KFD), (Mika, Ratsch, Weston, Scholkopf, & Mullers, 1999) Analysis, Kernel Principal Component Analysis (PCA) (Schölkopf, Smola and Müller, 1996), Kernel Independent Component Analysis Kernel (ICA) (Bach and Jordan, 2002), Mutual Information (Gretton, Herbrich, Smola, Bousquet, Schölkopf, 2005), Kernel ARMA (Martínez-Ramón, Rojo-Álvarez, Camps-Valls, Muñoz-Marí, Navia-Vázquez, Soria-Olivas, & Figueiras-Vidal, 2006), Partial Least Squares (PLS) (Momma & Bennet, 2003), Ridge Regression (RR) (Saunders, Gammerman, & Vovk, 1998), Kernel K-means (KK-means) (Camastra, & Verri, 2005), Spectral Clustering (SC) (Szymkowiak-Have, Girolami & Larsen, 2006), Canonical Correlation Analysis (CCA) (Lai & Fyfe, 2000), Novelty Detection (ND) (Schölkopf, Williamson, Smola, & Shawe-Taylor, 1999) and a particular form of regularized AdaBoost (Reg-AB), also known as Arc-GV (Rätsch, 2001). Successful applications of kernel-based algorithms have been reported in various fields such as medicine, bioengineering, communications, data mining, audio and image processing or computational biology and bioinformatics.

In many cases, kernel methods demonstrated results superior to their competitors, and also revealed some additional advantages, both theoretical and practical. For instance, kernel methods (i) efficiently handle large input spaces, (ii) deal with noisy samples in a robust way, and (iii) allow embedding user knowledge about the problem into the method formulation easily. The interest of these methods is twofold. On the one hand, the machine-learning community has found in the kernel concept a powerful framework to develop efficient non-linear learning methods, and thus solving efficiently complex problems (e.g. pattern recognition, function approximation, clustering, source independence, and density estimation). On the other hand, these methods can be easily used and tuned in many research areas, e.g. biology, signal and image processing, communications, etc, which has also captured the attention of many researchers and practitioners in safety-related areas.

Complete Chapter List

Search this Book:
Reset