Nested Cross-Validation — Hyperparameter Optimization and Model Selection

Cross-Validation also referred to **as out of sampling technique **is an essential element of a data science project. It is a resampling procedure used to evaluate machine learning models and access how the model will perform for an independent test dataset.

Hyperparameter optimization or tuning is a process of choosing a set of hyperparameters for a machine learning algorithm that performs best for a particular dataset.

Both Cross-Validation and Hyperparameter Optimization is an important aspect of a data science project. Cross-validation is used to evaluate the performance of a machine learning algorithm and Hyperparameter tuning is used to find the best set of hyperparameters for that machine learning algorithm.

Model selection without nested cross-validation uses the same data to tune model parameters and evaluate model performance that may lead to an optimistically biased evaluation of the model. We get a poor estimation of errors in training or test data due to information leakage. To overcome this problem, Nested Cross-Validation comes into the picture.

Comparing the performance of non-nested and nested CV strategies for the Iris dataset using a Support Vector Classifier. You can observe the performance plot below, from this article.

#education #crossvalidation #machine-learning #data-science #artificial-intelligence

towardsdatascience.com

Nested Cross-Validation — Hyperparameter Optimization and Model Selection