**Deep Learning**and **Machine Learning** are no longer a novelty. Many applications are utilizing the power of these technologies for cheap **predictions**, object detection and various other purposes. At this blog, we usually write about deep learning, but we felt the need to address some more standard Machine Learning techniques and algorithms and go back to where it all **started**. In this article, we start off simple with **Linear Regression**. It is a well-known algorithm and it is the basics of this vast field. Linear Regression is, sort of, the root of it all. We will address theory and math behind it and show how we can implement this simple algorithm using **several** different technologies.

Are you afraid that AI might take your job? Make sure you are the one who is building it.

STAY RELEVANT IN THE RISING AI INDUSTRY!

For the purpose of this article, make sure that you have installed the following _Python _libraries:

- **NumPy **– Follow
**this guide**if you need help with installation. - **SciKit Learn **– Follow
**this guide**if you need help with installation. - **TensorFlow **– Follow** this guide** if you need help with installation.
- **Pytorch **– Follow
**this guide**if you need help with installation.

Once installed make sure that you have imported all the necessary modules that are used in this tutorial.

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import tensorflow as tf
import torch
```

Also, make sure that you are familiar with the basics of **linear algebra**, **calculus** and **probability**.

Sometimes data that we have is quite **simple**. Sometimes, the output value of the dataset is just the **linear combination**of features in the input example. Let’s simplify it even further and say that we have only **one** feature in the input data. A mathematical model that describes such a **relationship** can be is presented with the formula:

For example, let’s say that this is our data:

In this particular case, the mathematical model that we want to create is just a **linear function** of the input feature, where *b0* and *b1* are the model’s **parameters**. These parameters should be learned during the training process. After that, the model should be able to give correct output **predictions** for new inputs. To sum it up, during training we need to learn _b0_and _b1 _based on the values of *x* and *y*, so our *f(xi)* is able to return correct predictions for the **new** inputs. If we want to **generalize** even further we can say that model makes a prediction by adding a constant (bias term – *b0*) on the precomputed weighted sum (*b1)* of the input features. However, let’s back to our example and clear things up a little bit before we dive into generalization. Here is what the aforementioned data looks like on the **plot**:

Our linear regression model, by calculating optimal *b0* and *b1,* produces a line that will best fit this data. This line should be optimally **distanced** from all points in the graph. It is called the **regression line**. So, how does the algorithm calculates *b0* and *b1* values?

In the formula above, *f(xi)* represents the predicted output value for ith example from the input, and *b0* and *b1* are regression coefficients that represent the **y-intercept** and **slope** of the regression line. We want that value to be as close as possible to the real value – *y*. Thus model needs to learn the values regression coefficients *b0* and *b1*, based on which model will be able to predict the correct output. In order to make these estimates, the algorithm needs to know how bad are his **current** estimations of these coefficients. At the beginning of the training process, we feed samples into the algorithm which calculates output *f(xi)* of the current sample, based on **initial** values of regression coefficients. Then the error is **calculated** and coefficients are corrected. Error for each sample can be calculated like this:

Meaning, we **subtract** estimated output from the real output. Note that this is a training process and we **know** the value of the output in the i-th sample. Because *ei* depends on coefficient values it can be described by the **function**. If we want to minimize *ei* and for that, we need to define a function based on which we will do so. In this article, we use the **Least Squares Technique** and define the function that we want to minimize as:

The function that we want to minimize is called the **objective function** or **loss function**. In order to minimize *ei*, we need to find coefficients *b0* and *b1* for which *J* will hit the global minimum. Without going into mathematical details (you can check out that **here**), here is how we can calculate values for *b0* and *b1:*

Here _SSxy _is the sum of cross-deviations of y and x:

while *SSxx* is the sum of squared deviations of x:

Ok, so much for the theory, let’s implement this algorithm using *Python*.

#ai #python #data science #datascience #deep learning #pytorch

1.05 GEEK