If you’ve taken an introductory Machine Learning class, you’ve certainly come across the issue of overfitting and been introduced to the concept of regularization and norm. I often see this being discussed purely by looking at the formulas, so I figured I’d try to give a better insight into why exactly minimising the norm induces regularization — and how L1 and L2 differ from each other — using some visual examples.
Using the example of linear regression, our loss is given by the Mean Squared Error (MSE):
and our goal is to minimize this loss:
To prevent overfitting, we want to add abias towards less complex functions.That is, given two functions that can fit our data reasonably well, we prefer the simpler one. We do this by adding a regularization term, typically either the L1 norm or the squared L2 norm:
#overfitting #data-science #regularization #machine-learning #norms