Feature selection is an important task for any machine learning application. This is especially crucial when the data in question has many features. The optimal number of features also leads to improved model accuracy. Obtaining the most important features and the number of optimal features can be obtained via feature importance or feature ranking. In this piece, we’ll explore feature ranking.
The first item needed for recursive feature elimination is an estimator; for example, a linear model or a decision tree model.
These models have coefficients for linear models and feature importances in decision tree models. In selecting the optimal number of features, the estimator is trained and the features are selected via the coefficients, or via the feature importances. The least important features are removed. This process is repeated recursively until the optimal number of features is obtained.
Scikit-learn makes it possible to implement recursive feature elimination via the sklearn.feature_selection.**RFE**
class. The class takes the following parameters:
estimator
— a machine learning estimator that can provide features importances via the coef_
or feature_importances_
attributes.n_features_to_select
— the number of features to select. Selects half
if it’s not specified.step
— an integer that indicates the number of features to be removed at each iteration, or a number between 0 and 1 to indicate the percentage of features to remove at each iteration.Once fitted, the following attributes can be obtained:
ranking_
— the ranking of the features.n_features_
— the number of features that have been selected.support_
— an array that indicates whether or not a feature was selected.#overviews #feature selection #machine learning #python #scikit-learn