The **code** that accompanies this article can be found **here****.**

☞ **Machine Learning Optimization (1) - Gradient Descent with Python**

So far in our journey through the Machine Learning universe, we covered several big topics. We investigated some **regression** algorithms, **classification** algorithms and algorithms that can be used for both types of problems (**SVM****, ****Decision Trees** and Random Forest). Apart from that, we dipped our toes in unsupervised learning, saw how we can use this type of learning for **clustering** and learned about several clustering techniques.

We also talked about how to quantify machine learning model **performance** and how to improve it with **regularization**. In all these articles, we used Python for “from the scratch” implementations and libraries like **TensorFlow**, **Pytorch** and SciKit Learn. In the previous article, we covered the topic of Gradient Descent, the grandfather of all optimization techniques. Following down that path, we explore momentum-based optimizers and the optimizers that scale the learning rate.

Gradient Descent keeps track only of the gradient from the previous step, which can make it slow. If the gradient is small, meaning that the slope is not that big, it will take it some time until the **minimum** is reached. Essentially, this is the problem that optimizers that we explore are trying to solve. These optimizers are more often used with neural networks than with regular machine learning algorithms. However, in order to simplify explanations, we use **linear regression** for all explanations. As in the **previous article**, it is important to note that these techniques are **not** machine learning algorithms. They are solvers of minimization problems in which the function to minimize has a gradient in most points of its domain.

#machine learning #python #artificaial inteligance #data science

8.10 GEEK