How to implement multiple linear regression and interpret the results. Source code and interesting basketball player dataset has been provided.
Let me get right into the subject. The picture you see above is the mathematical representation of Multiple Linear Regression. All the necessary explanation is given in the image.
As the name suggests MLR (Multiple Linear Regression) is linear combination of multiple features/variables that define the average behavior of the dependent variable.
Consider x1,x2,..xp as the independent variables and Y is the dependent variable. All the beta values correspond to the coefficients for respective independent variables. Beta0 on the other hand is the intercept values which is similar to Simple Linear Regression.
What’s the error term ?
It is an error that is there in the nature. Remember in my previous article I specified one can never predict the exact future value ? that is due to the fact that this error is present. Error consists of all the data that is not recorded/ used in our model. Such as emotions, feelings etc that can not be easily quantifiable or simply the lack of data/human errors in recording the data.
But don’t worry about it. Once we use this regression method we only get the average behavior of the Y variable. This average behavior when compared to the actual data, might be greater than, less than or equal to the original predictions of Y. Since we deal with only the average of Y the error terms cancel out each other and we have an estimated regression function in the end with no error term.
How to decrease error ? Simple, invest more money and find more data.
Let us consider a firm’s profit as the dependent variable (Y) and it’s spending in RnD (x1), Advertising (x2) and Marketing (x3) be our independent variables.Let’s say after doing all the math we come up with the below regression equation or function [the other name for the mathematical representation of any regression]. Please bear in mind that the below function is hypothetical.
Profit = 10000 + 5(RnD) - 2(Advertising) + 1(Marketing)
How are these estimates calculated ?
There is a mathematical method called OLS (Ordinary Least Squares) method. Using certain matrix transformations you can find the estimated coefficient values. Explaining OLS is not in the scope of this article. You can easily find tutorials online regarding the same, kindly go through them if you really want to know how it works. However, modern programming languages will help you in computing those estimates for you.
Let us deep dive into python and build a MLR model and try to predict the points scored by basketball players.
Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.
Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant
Understand the Intricacies of Multiple Linear Regression. No matter your exposure to data science & the world of statistics, it’s likely that at some point, you’ve at the very least heard of regression.
You will discover Exploratory Data Analysis (EDA), the techniques and tactics that you can use, and why you should be performing EDA on your next problem.
Why should you learn R programming when you're aiming to learn data science? Here are six reasons why R is the right language for you.