The fuel of each and every machine learning or deep learning model is data. Without data, the models are useless. Before building a model and train it, we should try to explore and understand the data at hand. By understanding, I mean correlations, structures, distributions, characteristics and trends in data. A comprehensive understanding of data will be very useful in building a robust and well-designed model. We can draw valuable conclusions by exploring the data.
In this post, I will walk through an exploratory data analysis process of [English Premier League 2019–2020 season dataset] which is available on Kaggle.
Let’s start by reading the data into a Pandas dataframe:
import numpy as np import pandas as pd df_epl = pd.read_csv("../input/epl-stats-20192020/epl2020.csv") print(df_epl.shape) (576, 45)
Dataset has 576 rows and 45 columns. To be able to display all the columns, we need to adjust display.max_columns setting.
pd.set_option("display.max_columns",45) df_epl.head() !(https://miro.medium.com/max/816/1*Ep38v5KYegOGQVmfsSmH8Q.png) It does not fit on the screen but we can see all the columns by sliding the scroll bar. The dataset includes the statistics for 288 games. There are 576 rows because each game is represented with two rows, one from the home team side and one for away team side. For instance, the first two rows represent “Liverpool-Norwich” game. The first column (“Unnamed: 0”) is redundant so we can just drop it:
df_epl.drop([‘Unnamed: 0’], axis=1, inplace=True)
df_epl = df_epl.reset_index(drop=True)
The dataset includes lots of different statistics about games. * xG, xGA: Expected goals for team and opponent * scored, missed: Goal scored and conceded * xpts, pts: Expected and received points * wins, draws, losses: Binary variables showing the result of the game * tot_goal, tot_con: Total goals scored and conceded from the beginning of the season There are also basic stats such as shots, shots on target, corner kicks, yellow card, red card. We also have information about the date and time of the games. Let’s start with days:
Most of the games are played on Saturdays.
We can quickly create a standing based on the total number of points achieved so far. The maximum value in the tot_points column shows the most up to date points:
I only displayed the first 10 teams. If you are a football (i.e. soccer) fan, you may have heard of the success of Liverpool dominating the English Premier League this season. Liverpool leads by 25 points.
The advancements in technology and data science brought up new stats in football. One type of relatively new stats is “expected” stats such as expected goals and expected points. Let’s check how close expected and actual values are. There are different ways to do a comparison. One way is to check the distribution of the difference:
#Data visualization libraries import matplotlib.pyplot as plt import seaborn as sns sns.set(style='darkgrid') %matplotlib inline plt.figure(figsize=(10,6)) plt.title("Expected vs Actual Goals - Distribution of Difference", fontsize=18) diff_goal = df_epl.xG - df_epl.scored sns.distplot(diff_goal, hist=False, color='blue')
#data-analysis #artificial-intelligence #data-science #data analysis
As a soccer fan you can hear it everywhere “Home of the best league: The premier league”, “The best league in the world is the premier league” … . Many people, but also professionals such as journalists and commentators regard the premier league as the best league in the world.
As a follower of both international UEFA leagues (Champions League and Euro League) I was confused about these statements. In my opinion, it was not clear which league is the best in the world and I would rather say it varies from year to year. The first thing I did to find it out was to look at the overall international titles that the various countries gathered. In the following, I only regard four leagues: Serie A, Bundesliga, LaLiga, and Premier League. If you are into soccer, you might know why only these four leagues are considered. Taking a look at all titles and finalists in the last 10 to 20 years these four leagues represent above 90% of all titleholders and finalists.
Taking a look at the graphic below its obvious that the Spanish league LaLiga is by far the most successful league with a total of 30 titles. In this statistic the Bundesliga of my home country Germany is far behind at the end with only 14 titles. The premier league is in the middle with 22 titles close to Italy with 21 titles.
#fifa #python #soccer #champions-league #premier-league
The beautiful game is back on the pitch in the U.K. — and cyberattackers will be looking to take advantage of fans streaming the games.
England’s Premier League is returning this week, with millions of soccer fans around the world looking to stream matches using their online video accounts. Unfortunately, the U.K.’s National Cyber Security Centre (NCSC) is warning on phishing, fraud and brute-forcing attempts by attackers looking to break into those accounts.
The organization said in a Wednesday announcement that it expects a rash of phishing, scam and account-takeover efforts centered around the return of the country’s most popular sport — a kind of hat trick of attack types. The assessment, it said, is based on precedent: The NCSC has also observed escalating cyberattacks on television streaming subscriptions as more and more people quarantine at home during the COVID-19 pandemic.
“As well as illegally watching the game the victim has paid for, the hackers could make unauthorized purchases on the platform or look to find personal information that could be used for further scams – including targeting them with scam emails or phone calls,” the organization warned.
#cloud security #web security #premier league #soccer #security
Fantasy premier league consists of choosing 15 players from the different teams that participate in the English Premier League to compose your team that will earn points along the football season in the Premier League according to the following rules:
#python-scripts #fantasy-premier-league #python #fantasy-football
am going to show you the different ways you can build a football league table in Excel. Some of the methods are old school but others utilise Excel’s new capabilities.
In case you weren’t already aware, Excel has undergone a big change to its calculation engine fairly recently. The concept of dynamic arrays was first introduced back in September 2018, however, for many Microsoft 365 users the first batch of new functions took an awfully long time to appear. Unless you have been an Office Insider, you will not have been able to use them. Even though the update was rolled out to my copy towards the start of the year, there were still swathes of users who were kept waiting.
Since dynamic arrays were introduced in Excel, array formulas no long require you to press Ctrl + Shift + Return every time you edit a cell. This was an annoying practice that made many users, including myself, reluctant to use arrays. They just didn’t feel like a native and integrated part of Excel. Now you can use an array formula like any other — without that additional step.
Download the workbook from here: https://bit.ly/39mlqkp.
Quick caveat: if you have an older version of Excel, you will find some of the examples do not work because of compatibility issues. This is unavoidable unless you purchase a Microsoft 365 subscription. Personally, I would recommend you do so.
Firstly, a dataset is required containing a list of all the matches played and their respective results. I have used English Premier League data from the 2019/20 season for this example. To conserve space elsewhere, the matches are stored in a separate worksheet called Data — with the table itself named DataTable.
2019/20 Premier League Dataset
You’ll notice there’s a calculated column on the end called Result. This formula looks at the home_goal and away_goal fields for each match played and determines whether the outcome was a home win (H), draw (D) or away win (A).
There are three sections: Part A, Part B and Part C. Each contains multiple league tables that output identical values, but the method used differs.
Any kind of system that involves ranking data is typically going to require an unordered and ordered table. The former houses the mathematical calculations and determines the ranking of each row, whilst the latter references it to output the data in the correct order. Part A and Part B are based off this principle. Part C, however, contains two variants that are not dependent on an additional table.
The tables in the workbook use these headers:
*Table A2 only
The approaches here are all based on official Excel tables. The way to tell if what looks like a table is indeed a table— is to check if it has a small blue triangle in the bottom-right, or to click on it and the Table Design tab will appear in the ribbon.
We start off by creating **Table A1, **which is unordered and forms the base for **Table A2, Table A3 and Table A4 **to work off. The P, W, D, L columns use COUNTIFS formulas to count the number of matches a team has played, won, drawn and lost respectively. It’s important to note that a single COUNTIFS formula only allows for AND conditions. That means all criteria must be met for a successful count. As we have home and away matches to consider, we need to use two COUNTIFS statements in the same cell to add the counts together. The same concept applies to the SUMIFS function, which has been used for the columns that involve addition: F and A.
#excel #football #premier-league #soccer #dynamic-array
Going into the international break after Game-week 4, 38 matches have been played which is exactly 10% of the total 380 matches to be played during the season. The current season has been by far been unpredictable with last year’s top teams dropping points and some mid-table teams and minnows performing remarkable well. In this post, I try to analyze the performance of teams and try to predict the result of upcoming fixtures.
Expected Goals(xG) is the major factor used for analysis and prediction, if you are not familiar with xG, it is recommended that you check out this post were xG is explained before proceeding further.
Due to the pandemic, matches are currently happening in empty stadiums. Home advantage is more than the familiarity with the playing turf, it is the spirit and encouragement by tens of thousands of die-hard fans rooting for the victory of the home team.
That’s why even the thought of visiting Anfield or Old Trafford sends shivers down the spine of away teams. Generally, teams perform better in front of their home crowd compared to away fixtures.
In the current season, so far there is no evidence of home advantage. Out of the 38 matches played so far, 19 resulted in the away team winning, 3 were draws and the Home team only managed to win 16 matches which is around 42% of the total matches played.
#data-analysis #english-premier-league #football #data-visualization #data-science