In machine learning, while building a predictive model for some classification or regression task we always split the data set into two different parts that is training and testing. The training part is used to train the machine learning model whereas the testing part is used for predictions by the model. These predictions are then evaluated using different evaluation methods. But do you think if you are getting an 85% test accuracy you will get the same performance of the model on production data? Does it guarantee the same results? The answer to this question is No we cannot expect the same accuracy. We can just get close to it but not the same. Therefore we need a method that can tell us that this is the range of accuracy that we can expect when we will use the model in production.

This is where K-Fold cross-validation comes into the picture that helps us to give us an estimate of the model performance on unseen data. Often this method is used to give stakeholders an estimate of accuracy or the performance of the model when it will put in production.

Through this article, we will see what exactly is K-fold cross-validation, how it works, and then we will implement it on a data set to check the estimation of accuracy which we can expect on unseen data. For this experiment, we are using the Pima Indian Diabetes data set that can be downloaded from the Kaggle website.

#developers corner #cross-validation #k fold cv #machine learning

Hands-On Implementation of K-Fold Cross-Validation and LOOCV in Machine Learning
1.35 GEEK