The Ultimate Guide to Evaluation and Selection of Models in Machine Learning

On a high level, Machine Learning is the union of statistics and computation. The crux of machine learning revolves around the concept of algorithms or models which are in fact statistical estimations on steroids.

However, any given model has several limitations depending on the data distribution. None of them can be entirely accurate since they are just estimations (even if on steroids). These limitations are popularly known by the name of bias and variance.

A model with high bias will oversimplify by not paying much attention to the training points (e.g.: In Linear Regression, irrespective of data distribution, the model will always assume a linear relationship).

A model with high variance will restrict itself to the training data by not generalizing for test points that it hasn’t seen before (e.g.: Random Forest with max_depth = None).

The issue arises when the limitations are subtle, like when we have to choose between a random forest algorithm and a gradient boosting algorithm or between two variations of the same decision tree algorithm. Both will tend to have high variance and low bias.

This is where model selection and model evaluation come into play!

In this article we’ll talk about:

What are model selection and model evaluation?
Effective model selection methods (resampling and probabilistic approaches)
Popular model evaluation methods
Important Machine Learning model trade-offs

#machine learning

neptune.ai

The Ultimate Guide to Evaluation and Selection of Models in Machine Learning - neptune.ai