Data Leakage with Hyper-Parameter Tuning

Hyper-Parameter Tuning sometimes messes up your model and leads to unpredictable results on unseen data.

Introduction

D**ata Leakage **is when the model somehow knows the patterns in the test data during its training phase. In other words, the data that you are using to train your ML algorithm happens to have the information you are trying to predict.

Data leakage prevents the model to generalize well. It’s very difficult for a data scientist to identify data leakage. Some of the reasons for data leakage are

Outlier and missing value treatment with central values before splitting
Scaling the data before splitting into training and testing
train your model with both train and test data.

**Hyper-Parameter Tuning **is the process of finding the best set of hyper-parameters of the ML algorithm that delivers best performance.

For more on Hyper-Parameters and Tuning Techniques refer my previous article.

#machine-learning #data-science #python #programming #ai

Introduction

towardsdatascience.com

Data Leakage with Hyper-Parameter Tuning