Feature Selection in Pathology Detection using Hybrid Multidimensional Analysis
Edilson Delgado-Trejos (National University of Colombia, Columbia), Germán Castellanos (National University of Colombia, Colombia), Luis G. Sánchez (National University of Colombia, Colombia) and Julio F. Suárez (National University of Colombia, Colombia)
Copyright: © 2008
Dimensionality reduction procedures perform well on sets of correlated features while variable selection methods perform poorly. These methods fail to pick relevant variables because the score they assign to correlated features is too similar, and none of the variables is strongly preferred over another. Hence, feature selection and dimensionality reduction algorithms have complementary advantages and disadvantages. Dimensionality reduction algorithms thrive on the correlation between variables but fail to select informative features from a set of more complex features. Variable selection algorithms fail when all the features are correlated, but succeed with informative variables (Wolf & Bileschi, 2005). In this work, we propose a feature selection algorithm with heuristic search that uses Multivariate Analysis of Variance (MANOVA) as the cost function. This technique is put to the test by classifying hypernasal from normal voices of CLP (cleft lip and/or palate) patients. The classification performance, computational time, and reduction ratio are also considered by comparing with an alternate feature selection method based on the unfolding of multivariate analysis into univariate and bivariate analysis. The methodology is effective because it has in mind the statistical and geometrical relevance present in the features, which does not summarize the analysis of the separability among classes, but searches a quality level in signal representation.
Key Terms in this Chapter
Heuristic Searching: Feature selection by heuristic measure.
Hypothesis Test: Statistical test for accepting or rejecting something by using a distribution.
CLP: Human pathology characterized by cleft lip and/or palate.
SFFS: Sequential Forward Floating Selection.
PCA: Principal Component Analysis. Orthogonal representation based on data variance.
Dimensionality Reduction: Data representation in a lower dimension space by linear or nonlinear mapping.
MANOVA: Hypothesis test based on multivariate analysis of variance.