Machine learning's feature selection technique aids in the selection of a subset of original features in order to decrease high-dimensional data space. As per the literature, there are two basic strategies for feature selection: supervised and unsupervised. This chapter will focus on supervised filtering approaches only. Filter, intrinsic, and wrapper are the three types of supervised filtering algorithms. Filtering strategies are the subject of this chapter. The chapter covers the most popular univariate filtering algorithms with examples, advantages and downsides, and R implementation. The chapter compares univariate filtering techniques with number of parameters. The chapter also depicts two popular multivariate filtering techniques: minimum redundancy and maximum relevance (mRMR) and correlation-based feature selection (CFS) using appropriate example and implementation with R programming. Finally, the chapter deals with prominent applications of filtering techniques in context to machine learning.
TopIntroduction
High-dimensional data processing is a big issue for engineers and academics in the field of Machine Learning (ML). A large number of variables can be found in high-dimensional data. In nature, certain variables are redundant and irrelevant. By removing duplicate and irrelevant data, feature selection provides a simple yet efficient solution to this problem. Removing extraneous data increases learning accuracy, decreases computation time, and makes the learning model or data easier to grasp. In practice, not all of the variables in a dataset are valuable when building a machine learning model. The addition of redundant variables reduces the model's generalization competence and may also reduce a classifier's overall precision. Furthermore, adding more variables to a model leads to the development of a complicated model.
Definition 1.1: Feature Selection: Feature selection is the process of selecting a subset of relevant features FS= {Rf1, Rf2, Rf3,...Rfm} from n (where n > m) predictors that are most significant and appropriate for any type of predictive modelling issue in Machine Learning.
The goal of feature selection is to achieve a number of things.
- 1.
It filters out irrelevant and noisy features, leaving just those with the least amount of redundancy and the greatest relevance to the target variable.
- 2.
It cuts down on the amount of time and effort required to train and test a classifier, resulting in more cost-effective models.
- 3.
It increases the effectiveness of learning algorithms, prevents overfitting, and aids in the creation of more general models.
The following are the several types of feature selection strategies used in Machine Learning:
- •
Supervised methods: These approaches are utilized for labelled data and to categories relevant features in supervised models like classification and regression.
- •
Unsupervised methods: These approaches are used for data that has not been labelled.
This chapter is only concerned with supervised approaches. Different forms of supervised feature selection algorithms are depicted in Figure 1.
Figure 1. Supervised Feature Selection
Filter Techniques, Intrinsic Methods, and Wrapper Techniques are the three types of supervised feature selection techniques.
Filter approaches are scalable (up to very high-dimensional data) and perform quick feature selection before classification, ensuring that the learning algorithm's bias does not interact with the feature selection algorithm's bias.
They primarily serve as rankers, arranging features in order of best to worst.
The order in which characteristics are ranked is determined by the intrinsic properties of the data, such as variance, consistency, distance, information, correlation, and so on.
There are numerous filter methods available, and new ones are produced on a regular basis; each utilizes a different criterion to determine the data's relevancy.
Wrapper approaches rely on the classifier since they use a machine learning algorithm as a black box evaluator to discover the optimal subsets of features.
As a wrapper, you can use any combination of the search strategy and modelling algorithm.
When a wrapper is used on a dataset with a lot of features, it uses a lot of computational resources and takes a long time to run.
Finally, these methods are straightforward to use and can be used to represent feature dependencies.
Intrinsic methods bridge the gap between filters and wrappers.
To begin, they employ a filter to combine measurable and statistical criteria to select some features, and then they apply a machine learning method to select the subset with the greatest classification performance.
They can describe feature relationships and lower the computational burden of wrappers without re-classifying the subsets in each iteration. They don't go through iterations.
Because feature selection is done during the learning phase, these approaches can fit models and select features at the same time. Their reliance on the classifier is one problem.