The Learning Rate Finder

Get to the neighborhood of optimal values quickly without costly searches. We'll apply the learning rate finder implementation to an example dataset, enabling us to obtain our optimal learning rates.

The learning rate is arguably the most important hyperparameter to tune in a neural network. Unfortunately, it is also one of the hardest to tune properly. But don’t despair, for the Learning Rate Finder will get you to pretty decent values quickly! Let’s see how it works and how to implement it in TensorFlow.

Why is it important?

To answer this question, let’s kick off with defining the learning rate. When you train a neural network, an optimization algorithm (typically some flavor of gradient descent) traverses the surface of the loss function seeking to walk down the slope, where the loss is decreasing. The learning rate is basically the size of the step it takes. And it’s pretty important this step size is not too small and not too large.

With too small a learning rate, the algorithm would take ages to reach the minimum, as in the left panel in the picture above. To make things worse, if there are local minima in the loss surface, the optimizer might get stuck in there, unable to get out with only small steps.

If the learning rate is too large, on the other hand, the optimization algorithm might overshoot the minimum and bounce around it, never to converge, and in the worst case, it can even diverge completely, like in the right panel of the picture above. Hence, it’s really vital to get your learning rate just right!

