Neural Networks are great function approximators and feature extractors but sometimes their weights become too specialized and cause overfitting. That’s where the concept of Regularization comes into picture which we will discuss along with slight differences between two major weight regularization techniques which are mistakenly considered the same.

Image for post

Introduction:

Neural Networks was first introduced in 1943 by Warren McCulloch and Walter Pitts but weren’t popular enough as they required large amounts of data and computation power which were not feasible at that time. But as the above constraints became feasible along with other training advancements such as parameter initialization and better activation functions, they again started to dominate the various competitions and found applications in various human assistive technologies.

Today Neural Networks form the backbone of many famous applications like Self-Driving Car, Google Translate, Facial Recognition Systems etc and are applied in almost all technologies used by evolving human race.

Neural Networks are very good at approximating functions be linear or non-linear and are also terrific when extracting features from the input data. This capability makes them perform wonders over a large range of tasks be it computer vision domain or language modelling. But as we all have heard the famous saying :

“With Great Power Comes Great Responsibility”.

This saying also applies to the all-mighty neural nets. Their power of being great function approximators sometimes causes them to overfit the dataset by approximating a function which will perform extremely well on the data on which it was trained on but fails miserably when tested on a data it hasn’t seen before. To be more technical, the neural networks learn weights which are more specialized on the given data and fails to learn features which can be generalized.

To solve the problem of overfitting, a class of techniques known as Regularization is applied to reduce the complexity of the model and constraint weights in a manner which forces the neural network to learn generalizable features.

Regularization:

Regularization may be defined as any change we make to the training algorithm in order to reduce the generalization error but not the training error. There are many regularization strategies. Some put extra constraints on the models such as adding constraints to parameter values while some add extra terms to the objective function which can be thought as adding indirect or soft constraints on the parameter values. If we use these techniques carefully, this can lead to improved performance on the test set. In the context of deep learning, most regularization techniques are based on regularizing the estimators. While regularizing an estimator, there is a tradeoff where we have to choose a model with increased bias and reduced variance. An effective regularizer is one which makes a profitable trade, reducing variance significantly while not overly increasing the bias.

The major regularization techniques used in practice are:

L2 Regularization
L1 Regularization
Data Augmentation
Dropout
Early Stopping

In this post, we mainly focus on L2 Regularization and argue whether we can refer L2 regularization and weight decay as two faces of the same coin.

#regularization #computer-vision #neural-networks #deep-learning #machine-learning #deep learning

Introduction:

Regularization:

towardsdatascience.com

Weight Decay == L2 Regularization?