The 5 Best Feature Selection Methods in few lines of codes

Image for post

Features selection is a second natural step after exploratory data analysis in most data science projects. This process consists of choosing the right features for obtaining the best predictions. Easy to use features selection methods generally include SelectFromModel, Feature ranking with recursive feature elimination, filter-based univariate selection, features importance, Voting Selector. To become a unicorn data-scientist mastering the most recent features selection methods is a must-have skill. In this article, we will review the Kaggle winners’ features selections methods which can be implemented in short python codes. For this article, we will analyze the sample chocolate bar rating dataset with the ‘smoky’ tasting as the target feature.

You can find the full dataset: here.

Image for post

A challenging dataset which contains after categorical encoding more than 2800 features.

  1. SelectFromModel

This method based on using algorithms (SVC, linear, Lasso…)which return only the most correlated features.

#import libraries
from sklearn.linear_model import LassoCV
from sklearn.feature_selection import SelectFromModel

#Fit the model
clf = LassoCV().fit(X, y)
#Selected features
importance = np.abs(clf.coef_)
idx_third = importance.argsort()[-3]
threshold = importance[idx_third] + 0.01
idx_features = (-importance).argsort()[:10]
name_features = np.array(feature_names)[idx_features]
print('Selected features: {}'.format(name_features))

**Selected features: **[‘cocoa_percent’ ‘first_taste_pungent raisin’ ‘first_taste_pure’ ‘first_taste_raisins’ ‘first_taste_raisiny’ ‘first_taste_raspberry’‘first_taste_raw’ ‘first_taste_red berry’ ‘first_taste_red fruit’,‘first_taste_red wine’]

Image for post


#data-exploration #data analysis

Best Bulletproof  Python Feature Selection Methods
1.75 GEEK