Before implementing Stochastic Gradient Descent let’s talk about what a Gradient Descent is.

Gradient Descent Algorithm is an iterative algorithm used to solve the optimization problem. In almost every Machine Learning and Deep Learning models Gradient Descent is actively used to improve the learning of our algorithm.

After reading this blog you’ll get to know how a Gradient Descent Algorithm actually works. At the end of this blog, we’ll compare our custom SGD implementation with SKlearn’s SGD implementation.

How does a Gradient Descent Algorithm work?

  1. Pick an initial random point x0.
  2. x1 = x0 - r [(df/dx) of x0]
  3. x2 = x1- r [(df/dx) of x1]
  4. Similarly, we find for x0, x1, x2 …………. x[k-1]

Here r is the learning rate and df/dx is the gradient function to minimize our loss.

Image for post

Image for post

Implementing Linear SGD with Mini-Batch

In Mini-Batch SGD the parameters are updated after computing the gradient of error with respect to a subset of the training set.

Let us take the Boston Housing Dataset from Kaggle as an example.

First, we will import all the necessary libraries.

import warnings
warnings.filterwarnings("ignore")
from sklearn.datasets import load_boston
from random import seed
from random import randrange
from csv import reader
from math import sqrt
from sklearn import preprocessing
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from prettytable import PrettyTable
from sklearn.linear_model import SGDRegressor
from sklearn import preprocessing
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

Now, we will load our dataset. Here, X contains the dataset we have and Y contains the label that we need to predict.

X = load_boston().data
Y = load_boston().target

Remember to split your data before scaling to avoid the data leakage problem.

# split the data set into train and test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)

Use Standard Scalar function to standardize your dataset. Here, we only fit the train data because we don’t want our model to see this data before, so as to avoid overfitting.

scaler = preprocessing.StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Creating the DataFrame using pandas.

X_train = pd.DataFrame(data = X_train, columns=load_boston().feature_names)
X_train['Price'] = list(y_train)  
X_test = pd.DataFrame(data = X_test, columns=load_boston().feature_names)
X_test['Price'] = list(y_test)

Let’s see how our X_train looks like.

X_train.head()

Image for post

Below is the loss function for our Linear Model that we need to minimize.

Image for post

Now, we calculate the gradients for our loss function L w.r.t Weights(W) and Intercept (b). Following is the equations for calculating the gradients,

Image for post

Image for post

After calculating the gradients we keep changing our weights and intercept value with each iteration.

Image for post

Image for post

Finally, we’ll implement our SGD function.

#data-science #stochastic-gradient #gradient-descent #machine-learning #deep-learning #deep learning

Implementing SGD From Scratch
18.80 GEEK