How to Hill Climb the Test Set for Machine Learning

Hill climbing the test set is an approach to achieving good or perfect predictions on a machine learning competition without touching the training set or even developing a predictive model.

As an approach to machine learning competitions, it is rightfully frowned upon, and most competition platforms impose limitations to prevent it, which is important.

Nevertheless, hill climbing the test set is something that a machine learning practitioner accidentally does as part of participating in a competition. By developing an explicit implementation to hill climb a test set, it helps to better understand how easy it can be to overfit a test dataset by overusing it to evaluate modeling pipelines.

In this tutorial, you will discover how to hill climb the test set for machine learning.

After completing this tutorial, you will know:

Perfect predictions can be made by hill climbing the test set without even looking at the training dataset.
How to hill climb the test set for classification and regression tasks.
We implicitly hill climb the test set when we overuse the test set to evaluate our modeling pipelines.

Kick-start your project with my new book Data Preparation for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

#data preparation #machine-learning

machinelearningmastery.com

How to Hill Climb the Test Set for Machine Learning