These last few years, football processes (scouting, managing training loads, pre/post match analysis, etc.) are slowly but surely evolving towards more data informed decision makings. For a football passionate data analyst/scientist, more and more datasets are getting freely accessible, driving opportunities for football data exploration. As a football fan, I was quite excited when Statsbomb released in 2019 the complete Messi Data Biography dataset. With the recent quarantine imposed by the COVID-19 pandemic, I had some free time that I have decided to put into good use. This is my story deep down the Messi rabbit hole.

All the relevant code, interactive visualization, and more, is available on my github project.

**TL;DR: **Messi is a genius, you can’t stop him. Or can you?

1. One more time: Lionel Messi

1.1 Digital love

One may wonder why write yet another piece about Lionel Messi. Well, first because thanks to Statsbomb his entire La Liga career is freely available in one single dataset. To this day, it’s the only complete dataset on a single player (considering players reaching the end of their careers).

Second, because I’m convinced we don’t know everything about Messi. A lot has already been written about him: scoring skills, play-maker, extensive comparaison with C. Ronaldo, etc. Maybe you already read that Messi is impossible, or that he walks better than most players run.

But to the best of my knowledge, the defensive side has not been extensively explored. And as I dived in the data biography, I started to wonder if it was possible to stop Messi? And if so, how to describe it with only data?

As event data in football leans toward the attacking side of the game, because defense has a lot more ‘off-ball’ events, it’s more difficult to analyze. It’s much more easy with tracking data (assuming the data is accurate enough) as we know the players position at all time, but it does not mean nothing can be found with event data. Defensive actions, pressure events, together with advanced metrics such as Expected goals, PPDA, or VAEP are enough material to get its hands dirty on the “Stopping Messi” quest.

1.2 Harder, Better, faster, stronger

How to stop Lionel Messi?

The million dollar question that made every coach facing him tear their hair out. He’s (one of) the greatest player of all time, tearing defense apart with FC Barcelona since 2004.

Just look at his Win/draw/loss distribution:

Messi overall winning stats (2004/05–2018/19). 339 victories and only 39 losts from 452 games.

From his first year in 2004 to the end of the 2018/19 season, Messi played 452 games with a total of 419 goals and 158 assists.

Messi goal repartition (2004/05–2018/19).

And if we consider only open play goals:

  • 336 goals for a total of 253.6 xG (Expected Goals).
  • 22 goals for a total of 14.5 xG for shots with xG smaller than 0.05.

Just for fun, here is his (open play) goal with the smallest xG value (of the all dataset): 0.024. Weirdly enough, it does not look like the hardest goal Messi scored but he shot from quite far and there is a lot of opponents in front of him.

Goal with smallest xG value. Opening goal during Sporting Gijon — FC Barcelona (2015/16)

These stats alone show how great Messi is as a goal scorer, and it’s only a small part of what he can do. If you want even more (advanced) stats about Messi, especially his dribbling skills, you can look at Statsbomb articles:

You can also find my own data crunching inside this notebook (see also the associated figures folder).

So, to sump-up, and stating the obvious: it is HARD to defend against a player like Messi.

2. Human after all

No matter how great Messi is, sometimes he can have a bad game. For example, from 2004 to 2019 he had some games without goals or assists. Which does not mean it was a bad game, but at least Messi was not directly decisive.

To illustrate this, we can look at the percentage of games where Messi did not scored or assisted.

Messi “bad games” percentage by season (beware, small sample size for 2004/05)

As you can see, it’s not that often that Messi does not increase his stats line. It’s not always bad, but it has an effect on Messi’s results:

Effect of Messi scoring or assisting (2004/05–2018/19).

It’s true that it’s not all about Messi, and this graph lacks some context (strength of opponents, quality of teammates that day, game’s stake, etc.), but it is still a huge drop in winning percentage.

If we consider xG, there is of course a big difference too, as you can see in the next graph. Games without goals or assists are showing a much slower xG output with an average of 0.33 compared to 0.89 when he scored or assisted. Which means that he had fewer scoring opportunities, but mainly he was (on average) in worse positions to score. And well, sometimes he was just unlucky, like that game were he outputted 1.35xG without scoring.

Messi xG distribution depending on him scoring or assisting a goal.

This shows that it is possible, sometimes, to reduce Messi’s influence.

In order to analyze how to defend against Messi, I choose to focus on one game: it’s much easier to visualize and it gives some ideas for future analysis in a larger scale.

How did I choose such a game? I decided to focus on the 2009/10 season:

  • It was a great season for Messi (one of many): 34 goals and 10 assists in 35 games in La Liga.
  • It was the best La Liga season for_ Pep Guardiola_ at FC Barcelona: 99 points, at that time a LaLiga record (now beaten by Real Madrid 2011/2012 and FC Barcelona 2012/2013 with 100 points).

Starting from this, I choose the match with the smallest Messi xG output throughout the entire season. Enter:

El derbi Catalan

It was also a duel between Guardiola and Pochettino, which makes things even more interesting.

#data-analysis #football-analytics #sports-analytics #data-science #sports #data analysis

Is it possible to stop Messi? A data perspective.
2.45 GEEK