The hello world of machine learning and computational neural networks usually start with a technique called regression that comes in statistics. Whether you are a seasoned developer or even a mathematician, having been reminded of the overall concept of regression before we move on to polynomial regression would be the ideal approach to take.

What is Regression?

We will get ourselves familiar with the concept of regression with a simple example. Let’s say we are looking to buy houses from the same city. Houses of this city are mainly dependent on how many square feet it has. We are concerned with the price and therefore, we need to know whether we are paying the best price for the house we are going to be purchasing. We have the house pricing information with us from one of our friendly realtors. What we are going to do is find a connection between the square feet and the price of the house, so that we can determine whether we are buying the right property. Let’s look at the data.

sqr_feet = [1350, 1472, 1548, 1993, 1998, 2100, 2450]
price = [643000, 679000, 702500, 759000, 760000, 795000, 855000]

Each of the data in these lists, correspond to the same elements at the same index. We can plot these values using Matplotlib to get an idea about how they look like.

import matplotlib.pyplot as plt
fig=plt.figure()
axes=fig.add_axes([0,0,1,1])
axes.scatter(sqr_feet, price, color='r')
plt.ylabel('Price(USD)')
plt.xlabel('Area(sqft)')
plt.show()

This is a scatter plot. However, we can also connect the lines to get a better idea on how the points have been distributed.

plt.plot(sqr_feet, price)
plt.ylabel('Price(USD)')
plt.xlabel('Area(sqft)')
plt.show()

Output:

We can see that the price has been increased proportionally with the available floor area of the house. Therefore, we can now come up with an equation to calculate a ballpark value. That way, if we are given a new house and its floor area, we can see whether we are paying a reasonable amount of not. Since this looks like it can be modeled using a straight-line, we can choose an equation after y = mx + c. However, it is not that easy as there are so many straight-lines that can take the shape of this curve. This is where regression comes in.

Regression can be used to statistically determine the gradient and the intercept of the potential straight-line using methods such as the least-squares method that is based on finding the minimum sum of the square residuals. The general idea is that we can use regression to determine the relationship between one dependent and one or more independent variables. In the data science jargon, the dependent variable is also known as y and the independent variables are known as x1, x2, … xi.

Let us use regression and find gradient (m) and intercept © for the above example.

#python #git #github

The Ultimate Guide to Polynomial Regression in Python
1.95 GEEK