## Introduction

Linear regression, or least squares regression, is the simplest application of machine learning, and arguably the most important. Many people apply the method _every day _without realization. Whenever you compute an arithmetic mean, we have a special case of linear regression — that is, that the best predictor of a response variable is the bias (or mean) of the response itself!

*At the core of the method of least squares lies the idea to minimize the sum of the squared “errors,” that is, to adjust the unknown parameters such that the sum of the squares of the differences between observed and computed values is minimized.*

Linear regression has had quite a lot written about it on TowardsDataScience, so why author another article on it? My purpose is not to “show how it is done”, but to illustrate linear regression as a convenient and practical example of a more fundamental concept — **estimation **— and to develop an *intuition* of the mechanisms for readers.

## The Basics

The word “linear” in “linear model” does not refer to the individual terms of the model such as whether they are squared, or have a square root, etc. It is surprising to many to find out that predictor variables can have all kinds of non-linear transformations applied to them, *and do* often have applied in order to create a valid linear model. Rather, “linear” refers to the behavior of the model as a whole: A linear model is one in which a linear combination of the predictor variables yields a prediction of a response variable. This means that

is a linear model, where *x₁* could be (using some example “measurement” value from our data):

#visual studio code #visual primer