The corona crisis affects our daily lives in ways that we previously never thought possible. A couple of weeks ago, the Belgian first division soccer teams played their first games of the new season. After only four games, it is already painstakingly clear that the corona crisis also affects soccer competitions. What follows is a brief analysis of the impact of the corona crisis on soccer games.

As data scientists and statisticians, we are constantly performing experiments and testing hypotheses. Most of the time, we have to accept certain limitations to our experiments. It only happens occasionally that we are given the opportunity to work with a so-called natural experiment.

natural experiment is an empirical study in which individuals (or clusters of individuals) are exposed to the experimental and control conditions that are determined by nature or by other factors outside the control of the investigators.

- Wikipedia

The corona crisis can be seen as a natural experiment to test the impact of soccer fans on the outcome of the games. The previous season (2019–2020) of the Belgian first division was stopped after the pandemic reached Belgium. Almost all games were completed with a stadium full of supporters. The first four games of the current season (2020–2021) were completed without supporters.

By using the corona crisis as a natural experiment, we can answer the following question:

Would removing all fans from the stadium impact the home advantage of soccer teams?


Let’s have a look at the data

For this project, I make use of two different datasets that were both downloaded from football-data-co.uk. One dataset contains all the game results for the Belgian first division season 2019–2020, the second dataset contains the results for the 2020–2021 season (so far). Both datasets are combined.

Image for post

The data that was downloaded from the database — Image by author

Besides the information shown above, the dataset contains much more information about the games. For this project, we only need the FTHG (amount of home goals scored), FTAG (amount of away goals scored), FTR (the result of the game), and the date of the game.

Two features are engineered:

  1. HomeTeamWon: indicator variable equal to one when the home team won the game;
  2. Year_2020: indicator variable equal to one when the game was played in the season 2020–2021

#sports #soccer #statistics #data-science #analytics

Soccer Teams Lost Their Home Advantage
1.35 GEEK