Deep Learningand Machine Learning are no longer a novelty. Many applications are utilizing the power of these technologies for cheap predictions, object detection and various other purposes. At this blog, we usually write about deep learning, but we felt the need to address some more standard Machine Learning techniques and algorithms and go back to where it all started. In this article, we start off simple with Linear Regression. It is a well-known algorithm and it is the basics of this vast field. Linear Regression is, sort of, the root of it all. We will address theory and math behind it and show how we can implement this simple algorithm using several different technologies.

Are you afraid that AI might take your job? Make sure you are the one who is building it.


For the purpose of this article, make sure that you have installed the following _Python _libraries:

  • **NumPy **– Follow this guide if you need help with installation.
  • **SciKit Learn **– Follow this guide if you need help with installation.
  • **TensorFlow **– Follow** this guide** if you need help with installation.
  • **Pytorch **– Follow this guide if you need help with installation.

Once installed make sure that you have imported all the necessary modules that are used in this tutorial.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

import tensorflow as tf
import torch

Also, make sure that you are familiar with the basics of linear algebracalculus and probability.

Simple Linear Regression

Sometimes data that we have is quite simple. Sometimes, the output value of the dataset is just the linear combinationof features in the input example. Let’s simplify it even further and say that we have only one feature in the input data. A mathematical model that describes such a relationship can be is presented with the formula:

For example, let’s say that this is our data:

In this particular case, the mathematical model that we want to create is just a linear function of the input feature, where b0 and b1 are the model’s parameters. These parameters should be learned during the training process. After that, the model should be able to give correct output predictions for new inputs. To sum it up, during training we need to learn _b0_and _b1 _based on the values of x and y, so our f(xi) is able to return correct predictions for the new inputs. If we want to generalize even further we can say that model makes a prediction by adding a constant (bias term – b0) on the precomputed weighted sum (b1) of the input features. However, let’s back to our example and clear things up a little bit before we dive into generalization. Here is what the aforementioned data looks like on the plot:

Our linear regression model, by calculating optimal b0 and b1, produces a line that will best fit this data. This line should be optimally distanced from all points in the graph. It is called the regression line. So, how does the algorithm calculates b0 and b1 values?

In the formula above, f(xi) represents the predicted output value for ith example from the input, and b0 and b1 are regression coefficients that represent the y-intercept and slope of the regression line. We want that value to be as close as possible to the real value – y. Thus model needs to learn the values regression coefficients b0 and b1, based on which model will be able to predict the correct output. In order to make these estimates, the algorithm needs to know how bad are his current estimations of these coefficients. At the beginning of the training process, we feed samples into the algorithm which calculates output f(xi) of the current sample, based on initial values of regression coefficients. Then the error is calculated and coefficients are corrected. Error for each sample can be calculated like this:

Meaning, we subtract estimated output from the real output. Note that this is a training process and we know the value of the output in the i-th sample. Because ei depends on coefficient values it can be described by the function. If we want to minimize ei and for that, we need to define a function based on which we will do so. In this article, we use the Least Squares Technique and define the function that we want to minimize as:

The function that we want to minimize is called the objective function or loss function. In order to minimize ei, we need to find coefficients b0 and b1 for which J will hit the global minimum. Without going into mathematical details (you can check out that here), here is how we can calculate values for b0 and b1:

Here _SSxy _is the sum of cross-deviations of y and x:

while SSxx is the sum of squared deviations of x:

Ok, so much for the theory, let’s implement this algorithm using Python.

#ai #python #data science #datascience #deep learning #pytorch

Back to Machine Learning Basics
1.05 GEEK