Just Give Me The Code:

Make sure you have signed up on numer.ai as you’ll need to set up your API keys to make submissions directly from colab.

💡 The Numerai tournament problem

The Numerai data science problem is like a typical supervised machine learning problem, where the data has several input features and corresponding labels (or targets). And our goal is to learn a mapping from input to targets using various techniques. We usually split data into training and validation parts. and most of the time is spent on cleaning the data.

Image for post

Image for post

Left: Sample of training data. Right: Sample submission

However, Numerai data is different. It is a problem of predicting the stock market but what makes it unique is that the data is **obfuscated **and is already cleaned! We don’t know which row corresponds to which stock. Moreover, each row is grouped into eras that represent different points in time but as long as it has a structure, we can certainly try to learn and map patterns from it.

Numerai gives this cleaned data to data scientists and asks them to provide better estimates for the data. These crowd-sourced predictions are used to build a meta-model and to invest in real stock markets around the world. The incentives are based on the quality of your predictions and the amount of your NMR staked. You earn a percentage of your stake if your predictions help to make a profit, otherwise, your stake gets burned. This earn/burn system keeps motivating for better and unique predictions. So, the more accurate and/or unique the predictions, the higher the returns. This is what makes it interesting and complex(hardest data science problem).

Let’s address this problem on Google Colab. An end-to-end walk-through using a simple yet very good technique— CatBoost. I’ll be explaining the colab snippets here. It would be really helpful if you open the notebook link in a new tab parallel to this

#data-science #artificial-intelligence #cryptocurrency #machine-learning #stock-market

A guide to “The hardest data science tournament on the planet”
2.80 GEEK