In this tutorial, we are going to group FIFA 20 players using the DBSCAN algorithm!

Understanding DBSCAN

  • Density-Based Spatial Clustering of Applications with NoiseDensity-based clustering locates regions of high density that are separated from one another by regions of low densityDensity: number of points within a specified radius (aka. Eps or ε)

Points

  • Core point: if a point has more than a specified number of points (MinPts) within EpsBorder point: it has fewer than MinPts within Eps, but is in the neighborhood of a core point**Noise (outlier): **any point that is not a core point or a border point

Core, Border, Outlier Example

How are the clusters formed?

  1. Select a point pRetrieve all points density-reachable from p w.r.t. eps and MinPts.
  • If p is a core point, a cluster is formed.If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database.

3. Continue the process until all of the points have been processed.4. The result is independent of the order of processing the points

ε-Neighborhood Concepts

  • Objects within a radius of epsilon from an objectCore objects: ε-Neighborhood of an object contains **at least MinPts **of objects

p is a core object

Reachability

  • Directly density-reachable: a point q is directly density-reachable from p if q is within the epsilon-neighborhood of p and p is a core point.

q is directly density-reachable from p

  • Density-reachable: a point “p” is said to be density reachable from a point “q” if point “p” is within ε distance from point “q” and “q” has a sufficient number of points in its neighbors which are within distance ε.

q is densit-reachable from p

Connectivity

  • Density-connectivity: a point p is density-connected to point q w.r.t epsilon and MinPts if there is a point r such that p and q are density-reachable from r w.r.t epsilon and MinPts

Advantages, Disadvantages & Applications

Grouping FIFA 20 Players using DBSCAN

Data Cleaning/Pre-Processing (Code from part 1 & 2)

import pandas as pd

import numpy as np
df = pd.read_csv("/content/players_20.csv")
df = df[['short_name','age', 'height_cm', 'weight_kg', 'overall', 'potential','value_eur', 'wage_eur', 'international_reputation', 'weak_foot','skill_moves', 'release_clause_eur', 'team_jersey_number','contract_valid_until', 'nation_jersey_number', 'pace', 'shooting','passing', 'dribbling', 'defending', 'physic', 'gk_diving','gk_handling', 'gk_kicking', 'gk_reflexes', 'gk_speed','gk_positioning', 'attacking_crossing','attacking_finishing','attacking_heading_accuracy', 'attacking_short_passing','attacking_volleys', 'skill_dribbling', 'skill_curve','skill_fk_accuracy', 'skill_long_passing','skill_ball_control','movement_acceleration', 'movement_sprint_speed', 'movement_agility','movement_reactions', 'movement_balance', 'power_shot_power','power_jumping', 'power_stamina', 'power_strength', 'power_long_shots','mentality_aggression', 'mentality_interceptions','mentality_positioning', 'mentality_vision', 'mentality_penalties','mentality_composure', 'defending_marking', 'defending_standing_tackle','defending_sliding_tackle', 'goalkeeping_diving','goalkeeping_handling', 'goalkeeping_kicking','goalkeeping_positioning', 'goalkeeping_reflexes']]
df = df[df.overall > 86] # extracting players with overall above 86
df = df.fillna(df.mean())
names = df.short_name.tolist() # saving names for later
df = df.drop(['short_name'], axis = 1) # drop the short_name column
df.head()

#clustering-analysis-in-ml #towards-data-science #data-science #fifa-20 #dbscan #data analysis

Grouping Soccer Players with Similar Skillsets in FIFA 20 , Part 3
2.40 GEEK