How Machine Learning could help on Horse Racing Betting

Machine learning has been widely used in many time series analysis and forecasting. With the help of a large amount of historical data and computing power nowadays, ML models can sometimes produce extremely useful insight and guidance to sports betting decision making.

Photo by Julia Joppien on Unsplash

This article illustrates how machine learning could help with horse racing betting strategy. We will use the data crawled from the Hong Kong Jockey Club home page, one of the oldest and largest horse racing institutes in the world. To avoid data leakage and evaluate the real performance of the model, we will be only using the matching data from the beginning of 2007 to fall 2019 to build the model and use it to bet on new upcoming matches. We’ve utilized the model and build a unique investment strategy to bet for a two-month period(2019/09–2019/11) and achieve a positive return in the experiment.

Dataset

As we mentioned earlier, we will be using all the games from 2007 to 2019 in Hong Kong as training and validation sets. And 2019 winter data for test set to evaluate the overall betting portfolio performance. There are 109085 rows and 61 columns in the training data containing various information about each game.

Feature Engineering & Modeling

The original data comes with a lot of information, we need to filter out which of them are useful and also try building new features from the data to help predicting the results. I will not provide too much detail on feature engineering, but here are some key insights if you would like to try by yourself.

Horse age, draw and odds of the horse 5 min before the race have a weak correlation to the winning probability.
New features generated from past performance(eg: last 5 match performance, past odd, total win in last 180 days, finish time, etc) could be relatively useful.
External data like weather, temperature, horse origin and information on the jockey would increase the performance of tree-base models.
Building different binary classification models to predict winning first place probably and winning the top 3 places produce better results.
Model stacking (NN, XGBT, GBRT, Linear, etc) significantly improves performance.
prediction result(winning probably) should be adjusted and normalized based on other horses in the same match.
Perform target encoding on horse and jockey largely improve model performance.
As a time series type problem, only use time-based cross-validation to validate performance and tune parameters.

Betting Strategy

After building a relatively useful model prediction the top 1 and top 3 winning probability of each race. I’ve spent a lot of time experimenting and researching on how to achieve positive returns from the models. Horse racing has a lot of uncertainties and human effort to remove any potential unfair advantages. The betting strategy becomes extremely important. After many experiments on running models against real matches, I’ve come up with a strategy with three essential concepts.

Expectation Return Ratio
Lowest Risk Betting
Kelly Criterion

#sklearn #programming #sports-betting #machine-learning #data-science

Dataset

Feature Engineering & Modeling

Betting Strategy

towardsdatascience.com

How Machine Learning could help on Horse Racing Betting