# Baby Steps Towards Data Science: Multiple Linear Regression in Python How to implement multiple linear regression and interpret the results. Source code and interesting basketball player dataset has been provided.

# What is Multiple Linear Regression?

Let me get right into the subject. The picture you see above is the mathematical representation of Multiple Linear Regression. All the necessary explanation is given in the image.

As the name suggests MLR (Multiple Linear Regression) is linear combination of multiple features/variables that define the average behavior of the dependent variable.

Consider x1,x2,..xp as the independent variables and Y is the dependent variable. All the beta values correspond to the coefficients for respective independent variables. Beta0 on the other hand is the intercept values which is similar to Simple Linear Regression.

What’s the error term ?

It is an error that is there in the nature. Remember in my previous article I specified one can never predict the exact future value ? that is due to the fact that this error is present. Error consists of all the data that is not recorded/ used in our model. Such as emotions, feelings etc that can not be easily quantifiable or simply the lack of data/human errors in recording the data.

But don’t worry about it. Once we use this regression method we only get the average behavior of the Y variable. This average behavior when compared to the actual data, might be greater than, less than or equal to the original predictions of Y. Since we deal with only the average of Y the error terms cancel out each other and we have an estimated regression function in the end with no error term.

How to decrease error ? Simple, invest more money and find more data.

Example:

Let us consider a firm’s profit as the dependent variable (Y) and it’s spending in RnD (x1), Advertising (x2) and Marketing (x3) be our independent variables.Let’s say after doing all the math we come up with the below regression equation or function [the other name for the mathematical representation of any regression]. Please bear in mind that the below function is hypothetical.

``Profit = 10000 + 5(RnD) - 2(Advertising) + 1(Marketing)``

Interpretations:

1. If the firm doesn’t invest in RnD, Advertising and Marketing, then their average profit would be \$10,000
2. If the firm increases RnD spends by \$100K, the average profit increases by \$500K/5 units[Here the units of the variables is very important. Based on the units you can rightly make the interpretations [All variables are in the units of \$100k], this makes sense because, more investment in RnD better products come into the market, thus better sales.
3. If the firm increases Advertising spending by \$100k, then the average profit decreases by \$200K/2 units. More expenditure on advertisements might decrease the overall profits.
4. If the firm increases Marketing spending by \$100k, then the average profit increases by \$100K/1 unit. More expenditure on marketing might increase its popularity and thus the profits.

How are these estimates calculated ?

There is a mathematical method called OLS (Ordinary Least Squares) method. Using certain matrix transformations you can find the estimated coefficient values. Explaining OLS is not in the scope of this article. You can easily find tutorials online regarding the same, kindly go through them if you really want to know how it works. However, modern programming languages will help you in computing those estimates for you.

# Implementation in Python

Let us deep dive into python and build a MLR model and try to predict the points scored by basketball players.

## 15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

## Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

## Learn Multiple Regression: Primer on Parallel Slopes Models

Understand the Intricacies of Multiple Linear Regression. No matter your exposure to data science & the world of statistics, it’s likely that at some point, you’ve at the very least heard of regression.

## Exploratory Data Analysis is a significant part of Data Science

You will discover Exploratory Data Analysis (EDA), the techniques and tactics that you can use, and why you should be performing EDA on your next problem.

## Why You Should Learn R — Learn Data Science with Dataquest

Why should you learn R programming when you're aiming to learn data science? Here are six reasons why R is the right language for you.