The Hitchhiker’s Guide to Optimization in Machine Learning

The aim of this article is to establish a proper understanding of what exactly “optimizing” a Machine Learning algorithm means. Further, we’ll have a look at the gradient-based class (Gradient Descent, Stochastic Gradient Descent, etc.) of optimization algorithms.

_NOTE: _For the sake of simplicity and better understanding, we‘ll restrict the scope of our discussion to supervised machine learning algorithms only.

Machine Learning is the ideal culmination of Applied Mathematics and Computer Science, where we train and use data-driven applications to run inferences on the available data. Generally speaking, for an ML task, the type of inference (i.e., the prediction that the model makes) varies on the basis of the problem statement and the type of data one is dealing with for the task at hand. However, in contrast to these dissimilarities, these algorithms tend to share some similarities as well, especially in the essence of how they operate.

Let’s try to understand the previous paragraph. Consider supervised ML algorithms as a superset. Now, we can go ahead and further divide this superset into smaller sub-groups based on the characteristics these algorithms share:

Regression vs classification algorithms
Parametric vs non-parametric algorithms
Probabilistic vs non-probabilistic algorithms, etc.

Although setting these differences apart, if we observe the generalized representation of a supervised machine learning algorithm, it’s evident that these algorithms tend to work more or less in the same manner.

Firstly, we have some labeled data, which can be broken down into the feature set X, and the corresponding label set Y.
Then we have the model function, denoted by F, which is a mathematical function that maps the input feature set X_i t the output ŷ_i.

To put it in layman’s terms, every supervised ML algorithm involves passing as input to the model function F a feature set X_i, which the function F processes to generate an output ŷ_i.

However, this is just the inference (or testing) phase of a model, where theoritically, we are supposed to use the model to generate predictions on the data it has never seen before.

But what about “training” the model? Let’s have a look at it next.

#optimization #deep-learning #data-science #artificial-intelligence #machine-learning #optimization

towardsdatascience.com

The Hitchhiker’s Guide to Optimization in Machine Learning