In the last post, we explained the most used linear regression machine learning technique, the least-squares. We explained distinct approaches to multiple linear regressions and regressions with multiple outputs.
But we assumed that we use all variables in the regression, today we will explain some techniques to select only a subset of variables. We do that because of two reasons:
As we explained in the last post, the least-squares model minimizes the bias of the data, but not the variance. Here is where the bias-variance trade-off enters the game. Estimating a model, the expected prediction error at point x is:
The error of prediction at a point x, self-generated.
We can decompose this error in three terms:
The error of prediction at point x decomposition, self-generated.
Here is where the bias-variance tradeoff appears, the bias and variance can be reduced to 0, but just in the world where data is perfectly predictable, it’s impossible to find in real-world data. So we have two errors to minimize and normally when we minimize the bias we increase the variance of our model.
#mathematics #machine-learning #linear-regression #data-science #deep-learning