In my previous article, I had explained how to build a small and nimble image classifier and what are the advantages of having variable input dimensions in a convolutional neural network. However, after going through the model building code and training routine, one can ask questions such as:

  1. How to choose the number of layers in a neural network?
  2. How to choose the optimal number of units/filters in each layer?
  3. What would be the best data augmentation strategy for my dataset?
  4. What batch size and learning rate would be appropriate?

Building or training a neural network involves figuring out the answers to the above questions. You may have an intuition for CNNs, for example, as we go deeper the number of filters in each layer should increase as the neural network learns to extract more and more complex features built on simpler features extracted in the earlier layers. However, there might be a more optimal model (for your dataset) with a lesser number of parameters that might outperform the model that you have designed based on your intuition.

In this article, I’ll explain what these parameters are and how do they affect the training of a machine learning model. I’ll explain how do machine learning engineers choose these parameters and how can we automate this process using a simple mathematical concept. I’ll be starting with the same model architecture from my previous article and will be modifying it to make most of the training and architectural parameters tunable.

#data-science #machine-learning #deep-learning #hyperparameter-tuning #bayesian-optimization

Hyperparameter tuning with Keras and Ray Tune
9.75 GEEK