Hyperparameter optimization is often one of the final steps in a data science project. Once you have a shortlist of promising models you will want to fine-tune them so that they perform better on your particular dataset.

In this post, we will go over three techniques used to find optimal hyperparameters with examples on how to implement them on models in Scikit-Learn and then finally a neural network in Keras. The three techniques we will discuss are as follows:

Grid Search
Randomized Search
Bayesian Optimization

You can view the jupyter notebook here.

Grid Search

One option would be to fiddle around with the hyperparameters manually, until you find a great combination of hyperparameter values that optimize your performance metric. This would be very tedious work, and you may not have time to explore many combinations.

Instead, you should get Scikit-Learn’s GridSearchCV to do it for you. All you have to do is tell it which hyperparameters you want to experiment with and what values to try out, and it will use cross-validation to evaluate all the possible combinations of hyperparameter values.

Let’s work through an example where we use GridSearchCV to search for the best combination of hyperparameter values for a RandomForestClassifier trained using the popular MNIST dataset.

To give you a feel for the complexity of the classification task, the figure below shows a few images from the MNIST dataset:

Image for post

To implement GridSearchCV we need to define a few things. First being the hyperparameters we want to experiment with and the values we want to try out. Below we specify this in a dictionary called param_grid.

from sklearn.ensemble import RandomForestClassifier
	from sklearn.model_selection import GridSearchCV

	param_grid = {'bootstrap': [True],
	     'max_depth': [6, 10],
	     'max_features': ['auto', 'sqrt'],
	     'min_samples_leaf': [3, 5],
	     'min_samples_split': [4, 6],
	     'n_estimators': [100, 350]
	    }

	forest_clf = RandomForestClassifier()

	forest_grid_search = GridSearchCV(forest_clf, param_grid, cv=5,
	                                  scoring="accuracy",
	                                  return_train_score=True,
	                                  verbose=True,
	                                  n_jobs=-1)

	forest_grid_search.fit(X_train, y_train)
view raw
gridsearchcv.py hosted with ❤ by GitHub

The param_grid tells Scikit-Learn to evaluate 1 x 2 x 2 x 2 x 2 x 2 = 32 combinations of bootstrap, max_depth, max_features, min_samples_leaf, min_samples_split and n_estimators hyperparameters specified. The grid search will explore 32 combinations of RandomForestClassifier’s hyperparameter values, and it will train each model 5 times (since we are using five-fold cross-validation). In other words, all in all, there will be 32 x 5 = 160 rounds of training! It may take a long time, but when it is done you can get the best combination of hyperparameters like this:

forest_grid_search.best_params_

{‘bootstrap’: True,
‘max_depth’: 10,
‘max_features’: ‘auto’,
‘min_samples_leaf’: 3,
‘min_samples_split’: 4,
‘n_estimators’: 350}
view raw
gridsearch.best_params_. hosted with ❤ by GitHub


Since n_estimators=350 and max_depth=10 are the maximum values that were evaluated, you should probably try searching again with higher values; the score may continue to improve.

You can also get the best estimator directly:

forest_grid_search.best_estimator_

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
	                       max_depth=10, max_features='auto', max_leaf_nodes=None,
	                       min_impurity_decrease=0.0, min_impurity_split=None,
	                       min_samples_leaf=3, min_samples_split=4,
	                       min_weight_fraction_leaf=0.0, n_estimators=350,
	                       n_jobs=None, oob_score=False, random_state=None,
	                       verbose=0, warm_start=False)
view raw
best_estimator_ hosted with ❤ by GitHub
```

And of course the evaluation score is also available:

```
forest_grid_search.best_score_

0.9459
view raw
gridsearch.best_score_ hosted with ❤ by GitHub
```

Our best score here is 94.59% accuracy which is not bad for such a small parameter grid.

#machine-learning #classification #deep learning

Grid Search

towardsdatascience.com

Hyperparameter Optimization with Scikit-Learn, Scikit-Opt and Keras