Hyper-Parameter Tuning sometimes messes up your model and leads to unpredictable results on unseen data.

Introduction

D**ata Leakage **is when the model somehow knows the patterns in the test data during its training phase. In other words, the data that you are using to train your ML algorithm happens to have the information you are trying to predict.

Data leakage prevents the model to generalize well. It’s very difficult for a data scientist to identify data leakage. Some of the reasons for data leakage are

  • Outlier and missing value treatment with central values before splitting
  • Scaling the data before splitting into training and testing
  • train your model with both train and test data.

**Hyper-Parameter Tuning **is the process of finding the best set of hyper-parameters of the ML algorithm that delivers best performance.

For more on Hyper-Parameters and Tuning Techniques refer my previous article.

#machine-learning #data-science #python #programming #ai

Data Leakage with Hyper-Parameter Tuning
2.70 GEEK