Overfitting is a significant issue in the field of data science that needs to handled carefully in order to build a robust and accurate model. Overfitting arises when a model tries to fit the training data so well that it cannot generalize to new observations. An overfit model captures the details and noise in training data rather than the general trend. Therefore, even slight changes in a features greatly changes the outcome of a model. Overfit models seem to be outstanding on training data but performs poor on new, previously unseen observations.

The main reason of overfitting is model complexity. Thus, we can prevent a model from overfitting by controlling the complexity which is exatly what regularization does. Regularization controls the model complexity by penalizing higher terms in the model. If a regularization terms is added, the model tries to minimize both loss and complexity of model.

Image for post

In this post, I will cover two commonly used regularization techniques which are** L1** and L2 regularization. The two main reasons that cause a model to be complex are:

  • Total number of features (handled by L1 regularization), or
  • The weights of features (handled by L2 regularization)

L1 Regularization

It is also called regularization for sparsity. As the name suggests, it is used to handle sparse vectors which consist of mostly zeroes. Sparse vectors typically result in very high-dimensional feature vector space. Thus, the model becomes very difficult to handle.

L1 regularization forces the weights of uninformative features to be zero by substracting a small amount from the weight at each iteration and thus making the weight zero, eventually.

L1 regularization penalizes |weight|.

L2 Regularization

It is also called regularization for simplicity. If we take the model complexity as a function of weights, the complexity of a feature is proportinal to the absolute value of its weight.

Image for post

L2 regularization forces weights toward zero but it does not make them exactly zero. L2 regularization acts like a force that removes a small percentage of weights at each iteration. Therefore, weights will never be equal to zero.

