Have you come across a large dataset that causes overfitting?

One of the reasons for overfitting is large weights in the network. A network with large network weights can be a sign of an unstable network where small changes in the input can lead to large changes in the output. A solution to this problem is to update the learning algorithm to encourage the network to keep the weights small. This is called regularization.

In this article, we will discover how weight regularization will help to train networks faster, reduce overfitting, and make better predictions with deep learning models.

There are techniques which are used for regularization which is mentioned below

  • Batch Normalization
  • Drop out layer

Let’s talk about batch normalization first,

Batch Normalization

Batch normalization is a technique for improving the speed, performance, and stability of artificial neural networks, also known as batch norm. The idea is to normalize the inputs of each layer in such a way that, they have a mean activation output zero and a unit standard deviation.

Why should we normalize the input?

Let say we have 2D data, X1, and X2. X1 feature has a very wider spread between 200 to -200 whereas the X2 feature has a very narrow spread. The left graph shows the variance of the data which has different ranges. The right graph shows data lies between -2 to 2 and it’s normally distributed with 0 mean and unit variance.

Image for post

Essentially, scaling the inputs through normalization gives the error surface a more spherical shape, where it would otherwise be a very high curvature ellipse. Having an error surface with high curvature will mean that we take many steps that aren’t necessarily in the optimal direction. When we scale the inputs, we reduce the curvature, which makes methods that ignore curvature like gradient descent work much better. When the error surface is circular or spherical, the gradient points right at the minimum.

oscillations of gradient descent is large due to high curvature areas of the objective landscape

oscillations of gradient descent is moderate due to spherical areas of the objective landscape

#batch-normalization #deep-learning #overfitting #dropout #regularization #deep learning

Regularization: Batch-normalization and Drop out
1.80 GEEK