3 Steps to Improve your Efficiency when Hypertuning ML Models

Switch between Different Models Effortlessly without Messing with the Code

Image for post

Motivation

You may hear about “no free lunch” (NFL) theorem, which indicates that there is no best algorithm for every data. One algorithm may perform well in one data but perform poorly in other data. That is why there are so many machine learning algorithms available to train data.

How do we know which machine learning model is the best? We cannot know until we experiment and compare the performance of different models. But experimenting with different models can be a mess, especially when you when to find the best parameters for your model with GridSearchCV.

For example, when we finished experimenting with RandomForestClassifier and switched to SVC, we might wish to save the parameters of RandomForestClassifier in case we want to reproduce the results we have with RandomForestClassifier. But how do we save these parameters efficiently?

Wouldn’t it be nice if we have the information about each model saved in different configuration files like below?

experiments/
├── data_preprocess.yaml
├── hyperparameters.yaml
└── model
    ├── random_forest.yaml
    └── svc.yaml

Each file under model will specify their parameters like this

When we want to use a specific model (let’s say RandomForestClassifier), all we need to do is to run the training file and specify the model we want to train with model=modelname

python train.py model=random_forest

Being able to do this has helped me experiment with different models much faster without being afraid of losing the hyperparameters of a particular model used for GridSearchCV. This article will show you how to switch between different models effortlessly like above with Hydra.

#command-line #data-science #machine-learning #python #gridsearchcv

Switch between Different Models Effortlessly without Messing with the Code

Motivation

towardsdatascience.com

3 Steps to Improve your Efficiency when Hypertuning ML Models