Hyperparameter Optimization for Optimum Transformer Models

The goal of any Deep Learning model is to take in an input and generate the correct output. The nature of these inputs and outputs, which can vary wildly from application to application, depends on the specific job that the model should perform. For example, a dog breed classification model might take images as its _input _and generate the name of the dog breed (or a numeric label corresponding to the breed) as the _output. _Another model might accept a text description of a dog as its _input _and generate the name of the dog breed as its output. The first model is an example of a _computer vision _model, whereas the latter is an example of a _natural language processing (NLP) _model.

Parameters Vs Hyperparameters

The internals of both models will consist of many fancy parts (convolutional layers, attention mechanisms, etc.), each tailored for their specific task. From a high-level perspective, all of these components constitute a set of parameters (or weights) which determine what the output will be for any given input. Training a deep learning model is the process of finding the set of values for these parameters which yield the best results on the given task.

By contrast, _hyperparameters _arethe factors which control the training process itself. The learning rate, the number of training epochs/iterations, and the batch size are some examples of common hyperparameters. The values chosen for the hyperparameters has a significant impact on the learned parameters, and by extension, the performance of the model.

In a nutshell, the _parameters _are what the model learns, and the hyperparameters determine how well (or how badly) the model learns.

Hyperparameter Optimization

Just like we have various techniques to train model parameters, we also have methods to find the best hyperparameter values. The process of finding the best _hyperparameter _values, which enables the model to discover the best set of parameters to perform a given task, is hyperparameter optimization.

As a loose analogy, think about overclocking a CPU. By optimizing things like voltages, temperatures, clock frequencies, etc. (the hyperparameters), you can get the CPU to perform at higher speeds despite not changing the CPU architecture (the model), or the components of the CPU (the parameters of the model).

Knowing what hyperparameters optimization is, you might wonder whether it is needed whenever you train a model. After all, many of us don’t bother with overclocking our CPUs considering they typically perform well out-of-the-box. Just like with modern-day CPUs, state-of-the-art deep learning models can generally perform well even without hyperparameter optimization. As long as you stick to sensible defaults, the power of SOTA pre-trained models combined with transfer learning is sufficient to produce a model with satisfactory performance.

But, when you don’t consider “good enough” to be good enough, hyperparameter optimization is a vital tool in your toolbox to help your model go the extra mile.

#nlp #machine-learning #data-science #artificial-intelligence #deep learning

Parameters Vs Hyperparameters

Hyperparameter Optimization

towardsdatascience.com

Hyperparameter Optimization for Optimum Transformer Models