The Complete Guide to Linear Regression Analysis

The Complete Guide to  Linear Regression Analysis

In this article, we will analyse a business problem with linear regression in a step by step manner and try to interpret the statistical terms at each step to understand its inner workings.


In this article, we will analyse a business problem with linear regression in a step by step manner and try to interpret the statistical terms at each step to understand its inner workings. Although the liner regression algorithm is simple, for proper analysis, one should interpret the statistical results.

First, we will take a look at simple linear regression and after extending the problem to multiple linear regression.

For easy understanding, follow the python notebook side by side.

What is Linear Regression?

Regression is the statistical approach to find the relationship between variables. Hence, the** Linear Regression** assumes a linear relationship between variables. Depending on the number of input variables, the regression problem classified into

1) Simple linear regression

2) Multiple linear regression

Business problem

In this article, we are using the Advertisement dataset.

Let’s consider there is a company and it has to improve the sales of the product. The company spends money on different advertising media such as TV, radio, and newspaper to increase the sales of its products. The company records the money spent on each advertising media (in thousands of dollars) and the number of units of product sold (in thousands of units).

Now we have to help the company to find out the most effective way to spend money on advertising media to improve sales for the next year with a less advertising budget.

Simple Linear Regression

Simple linear is an approach for predicting the quantitative response Y based on single predictor variable X.

This is the equation of straight-line having slope β1 and intercept β0.

Let’s start the regression analysis for given advertisement data with simple linear regression. Initially, we will consider the simple linear regression model for the sales and money spent on TV advertising media.

Then the mathematical equation becomes 𝑆𝑎𝑙𝑒𝑠 = 𝛽0 + 𝛽1 * 𝑇𝑉.

Step 1: Estimating the coefficients: (Let’s find the coefficients)

Now to find the estimate of the sales for the advertising budget, we have to know the values of the β1 and β0. For the best estimate, the difference between predicted sales and the actual sales (called as residual) should be minimum.

As the residual may be negative or positive, so while calculating the net residual it can be lead to cancellation terms and reduction of net effect which leads to a non-optimal estimate of coefficients. To overcome this, we use a Residual sum of squares (RSS).

With a simple calculation, we can find the value of β0 and β1 for minimum RSS value.

With the stats model library in python, we can find out the coefficients,

Table 1: Simple regression of sales on TV

Values for β0 and β1 are 7.03 and 0.047 respectively. Then the relation becomes, Sales = 7.03 + 0.047 * TV.

This means if we spend an additional 1000 dollars on TV advertising media it increases the sales of products by 47 units.

This gives us how strongly the TV advertising media associated with the sales.

Step 2: Assessing the Accuracy of the Coefficient Estimates ( How accurate these coefficients are? )

Why the coefficients are not perfect estimates?

The true relationship may not be perfectly linear, so there is an error that can be reduced by using a more complex model such as the polynomial regression model. These types of errors are called reducible errors.

On the other hand, errors may introduce because of errors in measurement and environmental conditions such as the office is closed for one week due to heavy rain which affects the sales. These types of errors are called_ irreducible errors_.

linear-regression machine-learning basics regression-analysis data-science data analysis

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Linear Regression Model for Machine Learning

An overview of the oldest supervised machine-learning algorithm, its type & shortcomings.

Linear Regression For Data Science

Linear regression is commonly used to quantify the relationship between two or more variables. It is also used to adjust for confounding.

15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

Baby Steps Towards Data Science: Multiple Linear Regression in Python

How to implement multiple linear regression and interpret the results. Source code and interesting basketball player dataset has been provided.

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant