To establish the possible **relationship **among different variables, various modes of statistical approaches are implemented, known as regression analysis. In order to understand how the variation in an independent variable can impact the dependent variable, regression analysis is specially molded out.

Basically, regression analysis sets up an equation to explain the significant relationship between one or more predictors and response variables and also to estimate current observations. The regression outcomes lead to the identification of the direction, size, and analytical significance of the relationship between predictor and response where the dependent variable could be numerical or discrete in nature.

Just a JOKE

WHERE do we use regression.

Hours spent studying Vs Marks scored by students
Amount of rainfall Vs Agricultural yield
Electricity usage Vs Electricity bill
Suicide rates Vs Number of stressful people
Years of experience Vs Salary
Demand Vs Product price
Age Vs Beauty
Age Vs Health issues
Number of Degrees Vs Salary
Number of Degrees Vs Education expenditure

Types of regression techniques

In addition to it, the types of regression analysis can be selected on the attributes, target variables, or the shape and nature of the regression curve that exhibit the relationship between dependent and independent variables. In this blog, we will discuss linear regression with MATH in Detail.

Introduction

Linear Regression is a supervised machine learning algorithm where the predicted output is continuous and has a constant slope. It’s used to predict values within a continuous range, (e.g. sales, price) rather than trying to classify them into categories (e.g. cat, dog). There are two main types:

Simple regression

Simple linear regression uses traditional slope-intercept form, where m and b are the variables. Our algorithm will try to “learn” to produce the most accurate predictions where x represents our input data and y represents our prediction.

y=mx+b ; m: slope, b: intercept

Multivariable regression

A more complex, multi-variable linear equation might look like this, where w represents the coefficients, or weights, our model will try to learn.

f(x,y,z)=w1x+w2y+w3z : x, y, z are three input parameters

The variables x, y, z represent the attributes, or distinct pieces of information, we have about each observation. For sales predictions, these attributes might include a company’s advertising spend on radio, TV, and newspapers.

Sales=w1Radio+w2TV+w3News

Simple Regression

Let’s say we are given a dataset with the following columns (features): how much a company spends on Radio advertising each year and its annual Sales in terms of units sold. We are trying to develop an equation that will let us to predict units sold based on how much a company spends on radio advertising. The rows (observations) represent companies.

Data Frame

Making Prediction

Our prediction function outputs an estimate of sales given a company’s radio advertising spend and our current values for Weight and Bias.

Sales=Weight⋅Radio+Bias

Weight: the coefficient for the Radio independent variable. In machine learning we call coefficients weights.

Radio: the independent variable. In machine learning we call these variables features.

Bias: the intercept where our line intercepts the y-axis. In machine learning we can call intercepts bias. Bias offsets all predictions that we make.

Our algorithm will try to learn the correct values for Weight and Bias. By the end of our training, our equation will approximate the line of best fit. For updating these weight and biases, we will introduce a cost function (or loss function) and try to reduce it’s value.

What is Cost Function?

The primary set-up for updating weight and biases is to define a cost function (also known as a loss function) that measures how well the network predicts outputs on the test set. The goal is to then find a set of weights and biases that minimizes the cost. One common function that is often used is the mean squared error, which measures the difference between the actual value of y and the estimated value of y (the prediction). The equation of the below regression line is hθ(x) = θ + θ1x, which has only two parameters: weight (θ1)and bias (θ0). Observe above image carefully and you will understand errors, that need to be corrected by calculating cost function and reducing it’s value.

Math

Given our simple linear equation y=mx+b we can calculate MSE as:

MSE=1N∑i=1n(yi−(mxi+b))2
N is the total number of observations (data points)
1N∑i=1n is the mean
yi is the actual value of an observation and mxi+b is our prediction

IS Variance and MSE same?

Variance is the measure of how far the data points are spread out whereas, MSE (Mean Squared Error) is the measure of how actually the predicted values are different from the actual values. Though, both are the measures of second moment but there is a significant difference. **_In general, the sample variance measures the spread of the data around the mean (in squared units), while the MSE measures the vertical spread of the data around the regression line (in squared vertical units). _**Hope you don’t get confused by these terms.

Gradient Descent

To minimize MSE, we use Gradient Descent to calculate the gradient of our cost function. Gradient Descent runs iteratively to find the optimal values of the parameters corresponding to the minimizing value of the given cost function, using calculus. Mathematically, the technique of the ‘derivative’ is extremely important to minimize the cost function because it helps get the minimum point. The derivative is a concept from calculus and refers to the slope of the function at a given point. We need to know the slope so that we know the direction (sign) to move the coefficient values in order to get a lower cost on the next iteration.

#data-science #linear-regression #machine-learning #scikit-learn #python