Feature selection is an important task for any machine learning application. This is especially crucial when the data in question has many features. The optimal number of features also leads to improved model accuracy. Obtaining the most important features and the number of optimal features can be obtained via feature importance or feature ranking. In this piece, we’ll explore feature ranking.

Recursive Feature Elimination

The first item needed for recursive feature elimination is an estimator; for example, a linear model or a decision tree model.

These models have coefficients for linear models and feature importances in decision tree models. In selecting the optimal number of features, the estimator is trained and the features are selected via the coefficients, or via the feature importances. The least important features are removed. This process is repeated recursively until the optimal number of features is obtained.

Application in Sklearn

Scikit-learn makes it possible to implement recursive feature elimination via the sklearn.feature_selection.**RFE**class. The class takes the following parameters:

  • estimator — a machine learning estimator that can provide features importances via the coef_ or feature_importances_ attributes.
  • n_features_to_select — the number of features to select. Selects half if it’s not specified.
  • step — an integer that indicates the number of features to be removed at each iteration, or a number between 0 and 1 to indicate the percentage of features to remove at each iteration.

Once fitted, the following attributes can be obtained:

  • ranking_ — the ranking of the features.
  • n_features_ — the number of features that have been selected.
  • support_ — an array that indicates whether or not a feature was selected.

#overviews #feature selection #machine learning #python #scikit-learn

Feature Ranking with Recursive Feature Elimination in Scikit-Learn
3.20 GEEK