Linear Regression For Data Science

Regression is the study of dependence — A Predictive modelling technique

It attempts to find the relationship between a DEPENDENT variable “Y” and an INDEPENDENT variable “X”.
(Note: Y should be a continuous variable while X can be categorical or continuous)
There are two types of regression —_ Simple Linear Regression and Multiple Linear Regression._
Simple linear regression will have one independent variable(predictor).
Multiple linear regression will have more than one independent variable (predictors).
In a nutshell — Linear Regression maps a continuous X to a continuous Y.

1. APPLICATIONS OF REGRESSION ANALYSIS:

— Example: Relationship between Age & Income

2. To forecast effects

— Example: Effect on sale income for 1000$ spent on marketing

3. To forecast trends

— Example: Predicting price of bitcoin in the next 6 months

Regression models predict continuous variables (Eg: Predict the temperature of a city)
Once it is known that the aim is to classify data — we choose Logistic Regression.
Linear Regression is not suitable for classification because “**the idea of fitting a straight line in case of a polynomial is a challenging task. **”

2. Data Quality:

Each missing value removes one data point that could optimize the regression.
In simple linear regression, the outliers can significantly disturb the outcome. (i.e. removing outliers enhances the model greatly)

3. Computational Complexity:

It is not expensive computation-wise as compared to decision tree (or) clustering.

4. Comprehensible & Transparent:

#data #linear-regression #data-science #machine-learning #towards-data-science