If you recall, we were using bookmaker spread inputs for basketball matches to forecast teams’ winning probabilities. After an intense Docker build that left a carbon footprint ten times larger than Greta Thunberg’s worst enemies combined, we built a basic basketball model based on Wayne Winston’s Mathletics. From there we deduced that our model’s four-match Chinese Basketball League forecast had a mean-squared error (MSE) of 2.13 compared to the actual market output.

I glossed over the MSE metric in my previous post because it was off-topic, but in this post I’m going to go through some of its concepts.

As always, Wikipedia puts the definition of MSE into words far more efficiently than I would be able to. MSE is a risk function, computing the average of the squares of errors between an estimate and its actual values.

*Alright mate, care to explain that in plain English though please?*

…so we have our Python Basketball model, which we are using to forecast the win probabilities off the initial spreads. This is our _estimator. _The win probabilities that we obtained from the betting markets (in our case, Pinnacle Sports) are our *observed values*, in other words, values which are taken to be true (now, they may or may not be “true” but… let’s not worry too much about that at the moment).

Now stop and think about this for a second. How do we know that our estimator is actually, well, accurate? It’s easy enough to confirm accuracy if a model produces values that are no different to those observed. While true, this is basically saying that somebody’s built a model that perfectly captures reality. Is that really possible? Pretty unlikely I’d say. Much more realistic would be an acceptance that there are some costs- for instance assumptions that were too simple (or completely wrong)- associated with a model. And that’s what a risk function is- it computes this specific loss of information in the form of a single value to the user. The MSE is just a special case of this.

Graph illustrating observed and expected values

Let’s generalise a little. I was too lazy to derive my own graph from a charting library so I nicked this example from the excellent Free Code Camp. This is the infamous ** y=Mx+C**. Hopefully you might have seen this in school, but in case you weren’t paying attention (or in case you dropped out to pursue a career much more successful than some bloke writing about MSE) then this is what’s represented in the graph:

**Purple dots**are points on the graph, each point has an x and y coordinate. These are your observed values- **Blue line **is the prediction line, covering the estimated values of the model
- The
**red line**between each purple point and the prediction line are the **errors. **Each error is the distance from the point to its predicted point.

#programming #optimization #python #machine-learning #sports-betting

2.10 GEEK