Linear regressions are among the most common and most powerful tools for data analysis. While other, more advanced forms of statistics have been developed over the years, linear regressions remain incredibly popular, because they’re easy to understand, interpret, and perform.

Linear regressions are among the most common and most powerful tools for data analysis. While other, more advanced forms of statistics have been developed over the years, linear regressions remain incredibly popular, because they’re easy to understand, interpret, and perform.

You can find regression implementations in nearly any programming language, analytical software, and even the standard TI-84 calculator. Its ubiquity allows math teachers to introduce it as early as middle school, meaning most people are at least familiar with it.

With the linear regression’s success, however, comes its misuse. As people may not completely understand its underlying assumptions, they’re more likely to use a make basic mistakes when applying it.

Luckily, some of those mistakes are easy to fix.

A Line of Best Fit on non-linear data. Figure produced by author.

Despite “linear” being in the name, one of the most common mistakes in linear regressions is fitting to non-linear data. The illustration above shows why this is a bad idea.

The straight line, the linear regression, doesn’t follow the curve of the data that it’s designed to mimic. As a result, the model behaves poorly and makes terrible predictions.

Nearly everybody does this at least once because they don’t take the time to do proper data exploration. Fitting each of the independent variables to check for a linear relationship, calculating correlation coefficients, or performing a principal component analysis can help prevent this mistake in the first place.

The best solution, however, is to check what type of relationship X has with Y and perform a transformation on X to fit to Y. For example, if the data forms a parabolic relationship, like in the example above, use X² as the independent variable instead of X.

machine-learning data-science linear-regression data-analysis regression

An overview of the oldest supervised machine-learning algorithm, its type & shortcomings.

Linear regression is commonly used to quantify the relationship between two or more variables. It is also used to adjust for confounding.

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

How to implement multiple linear regression and interpret the results. Source code and interesting basketball player dataset has been provided.

In this article, we will analyse a business problem with linear regression in a step by step manner and try to interpret the statistical terms at each step to understand its inner workings.