TopUnsupervised Feature Filtering
So far we only considered supervised feature selection methods, because unsupervised feature selection methods are scarce. Varshavsky et al. (Varshavsky, Gottlieb, Linial, & Horn, 2006) proposed several variants of a feature selection algorithm which is based on singular value decomposition (SVD), where features are selected according to their contribution to the SVD-entropy, which is the entropy defined for the distribution of eigenvalues of a square data matrix. Because SVD looks for eigenvalues of a matrix, it is akin to principal component analysis.
Because wrapper models for feature selection utilize a classifier (hence, class label) for judging on feature usefulness or relevance, wrappers cannot be associated with unsupervised way for selecting features. In contrast, filter models that do not require a classifier are better suitable for unsupervised feature selection. In bioinformatics, a typical example of the unsupervised feature filtering is gene shaving (Hastie, et al., 2000).
(Varshavsky, Gottlieb, Linial, & Horn, 2006) introduced three variants of unsupervised feature selection: simple feature ranking, forward feature selection and backward feature elimination. Due to time-consuming computations in the backward feature elimination variant, I do not describe it here. Readers are advised to consult the original article for details. Nevertheless, by learning about MATLAB® code for the forward feature selection in this chapter, interested readers will be able to easily implement the backward elimination algorithm themselves.
Singular Value Decomposition
Let A be a D×N data matrix, containing as its rows the expression levels of D genes, measured in N samples. The A can be written as
where U is a D×D orthogonal matrix, V is an N×N orthogonal matrix, and
is a D×N diagonal matrix with nonnegative values. Such a decomposition of the matrix A into the product of three other matrices is called the singular value decomposition (Lay, 2003).
An algorithm to find the singular value decomposition is given in (Steeb, 2006) as follows:
Find the eigenvalues
, of the N×N matrix ATA and arrange the eigenvalues in descending order.
Find the number of nonzero eigenvalues of the matrix ATA and denote this number as r.
Find the orthogonal eigenvectors vi of the matrix ATA corresponding to the obtained eigenvalues and arrange them in the same order to form column-vectors of the N×N matrix V.
Form a D×N matrix
by placing on the leading diagonal of it quantities
i=1,…,min(N,D). These quantities are called singular values.
Find the first r column vectors of the D×D matrix U:
These column vectors are the left-singular (ui) and right-singular (vi) vectors, respectively.
Add to the matrix U the rest of the D-r vectors using the Gram-Schmidt orthogonalization process.