Kernel-Based Feature Selection with the Hilbert-Schmidt Independence Criterion

Kernel-Based Feature Selection with the Hilbert-Schmidt Independence Criterion

DOI: 10.4018/978-1-60960-557-5.ch010
OnDemand PDF Download:
No Current Special Offers

Chapter Preview


Kernel Methods And Feature Selection

You already met with kernels in chapter on support vector machines. However, because of my claim that each chapter in this book is self-sufficient and self-contained, I would like to briefly reproduce the definition of kernel methods once again. Here, Wikipedia comes to our help (

Kernel methods are a class of pattern analysis (regression, clustering, correlation, classification) algorithms that approach the problem by projecting the data into a high dimensional feature space1978-1-60960-557-5.ch010.m01, where relations hidden in the data under study can be better revealed than when doing pattern analysis in the original space 978-1-60960-557-5.ch010.m02 of features. A good thing is that one does not need to know the explicit form of this projection, since it can be done via the inner products2 between the images3 of all pairs of data in the high dimensional feature space 978-1-60960-557-5.ch010.m03. The inner products are embedded into the definition of a kernel function (hence, the name ‘kernel methods’). The complete set of all inner products forms a kernel matrix. Another important thing about kernel methods is that computing the kernel function is cheaper than the explicit computation of the projection 978-1-60960-557-5.ch010.m04.

Readers who by some reason missed the chapter on support vector machines may wonder why the mapping to the space which dimension can be much higher than that of the original space is beneficial in our microarray setting. To address this worries and healthy doubts, I can say (after all the pioneers of kernel methods) that kernel matrix size is 978-1-60960-557-5.ch010.m05, where 978-1-60960-557-5.ch010.m06 is the number of instances in the training set. Since for gene expression data, it is common that the number of instances is much smaller than the number of genes whose expression levels were measured, we obtain a compact structure needed for pattern analysis.

In this chapter, by pattern analysis, we mean looking for dependence between the features and the class labels in the kernel-induced space. The key pre-assumption is that good (relevant for classification) features will maximize such dependence. The dependence itself is formulated through a mutual-information (mutual dependence of two variables) like quantity, which is introduced in the next section.

Complete Chapter List

Search this Book: