Predicting Formula 1 results with Elo Ratings

Formula 1 races often feel predictable: Mercedes has won the past 5 championships and 7 of the first 9 GPs in the 2020 season. There is certainly no lack of Formula 1 predictions online — anyone from professional pundits to self-appointed armchair “experts” is often keen to share theirs. My goal was to take a more analytical look at the problem: could Formula 1 races be predicted by algorithms?

I have been working on predicting Formula 1 qualifying results for the past two years. From the start, my goal was to predict race results as well, but this turned out to be more complex than expected. Qualifying results are relatively easier to model: in a Formula 1 qualifying, the winner is the driver who sets the fastest time. As such, drivers’ performance can be considered independent from others: they compete against the clock, and only indirectly against each other. This makes predicting results much easier: each driver’s predicted result can be directly compared against their achieved time. Modelling Formula 1 races is more complicated, because drivers compete directly against each other. But with a lockdown summer in sight, I decided it was time to give the race model a go. I was joined by two great developers, Raiyan and Philip, in our quest to build a model for predicting Formula 1 race results.

In F1 races, finishing times do not matter: you get the same amount of points whether you win the race by 20 or 0.1 seconds. We decided to model the race as a series of independent, head-to-head competitions between pairs of participants that end in a win or loss for a participant (or unresolved, if one or both did not finish the race). This model does not reason about finishing time or points gained, but rather checks how many head-to-head competitions a participant won, and how it matches with their expected scores. One of the main benefits behind this approach is its relative resilience in “freak” races: in a model that looks at finishing position alone for example, a race where most front-runners did not finish will result in an unreasonably large gain for weaker drivers. A model that reasons about head-to-head competitions will not take the retired drivers into account, resulting in a realistic adjustment to drivers’ scores.

#sports #sports-analytics #data-science #formula-1 #analytics

towardsdatascience.com

Predicting Formula 1 results with Elo Ratings