A simple high level overview of hyper parameter tuning methods without using code or math. The idea behind Grid Search is very intuitive. Incrementally move one hyper parameter while keeping the others fixed, and record the results. Basically do that for all the hyper parameters you suspect could be optimized.
The idea behind Grid Search is very intuitive. Incrementally move one hyper parameter while keeping the others fixed, and record the results. Basically do that for all the hyper parameters you suspect could be optimized. This is a very expensive and cumbersome way of optimizing and tuning your hyper parameters, and today is considered one of the less efficient ways of conducting hyper parameter search. Let’s see how we can do better.
From the Random Search paper
Though you might not think it at first, randomly choosing the values of all your hyper parameters is actually a much more efficient approach to hyper-parameter tuning. In this scenario instead of keeping all hyper parameters constant and adjusting one in an iterative fashion, we instead initialize all the hyper parameter values randomly at every trial. This is better because as it turns out, some of the hyper-parameters will be more important for optimization than others, and if we can’t tell the important hyper parameters from the unimportant ones, the next best thing we can do is randomly choose all the hyper parameter values at each trial. That will allow for a higher sampling rate for the important hyper parameters, and thus our optimization will be more efficient. The benefits of Random Search vs. Grid Search are explored in this paper.
From the Bayesian Optimization repo
Actually the high level concept is pretty simple, we’re trying to do the same thing with Bayesian optimization that we’re always trying to do in ML, that’s estimate functions, functions that are too complex to be formulated. But now the function we’re trying to approximate is our ML algorithm. In this instance we may be using Deep Learning or some other heavy form of ML, and we can only run a limited number of trials to test out different combinations of hyper parameters. If we could intelligently approximate the results of our ML algorithm before picking out our next hyper-parameter configuration, we could potentially save lot’s of time and money.
From the TPE paper
There are several Bayesian optimization functions, but they key idea is to use a bayesian approach to estimating a better configuration of hyper parameters given a previous set of configurations and their result. In the above image we use an algorithm called TPE, the basic concept is to separate our trials into two groups according to their performance, i.e. the group that gets better results and the group that gets worse results. Then we pick the next set of hyper parameters based on its probability of belonging to the good distribution as apposed to the bad distribution.
hyperparameter-tuning hyperband hyper-parameter-tuning machine-learning deep-learning
What exactly are they and how do they interact? .Parameters and Hyperparameters in Machine Learning and Deep Learning
Have you ever wondered about fine-tuning your hyperparameters? How to choose your hyperparameters to achieve an excellent predicting model with approximately 90% accuracy?
We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.
We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.
Hyperparameter Tuning of Keras Deep Learning Model in Python - This article will give you an overview of how to tune the deep learning model hyperparameters.