In this article, I will take another shot at classifying players in various clusters, depending on what they do on the court. However, I will do it using data science and more precisely the K-Means clustering.

About 5 months ago, I stumbled upon this article on TheScore. The summary: the traditional 5 positions are no longer enough to describe NBA players. The game has changed after all. The authors come up with a way to classify players in 9 classes, based on the way they play the game.

In this article, I will take another shot at classifying players in various clusters, depending on what they do on the court. However, I will do it using data science and more precisely the K-Means clustering.

I will also take a deeper look at what makes a winning team, i.e. what type of players should be put together for a team to be successful.

Let’s get to it!

I began by scaping data directly from NBA.com. In total, I collected a total of 28 stats for all 529 players that played in the league in 2019–2020.

Along with traditional stats (points per game, assists, rebounds, etc.), I also collected stats describing shot location, type of offensive play (drive, iso, etc.) defensive efficiency and usage rate.

Then, I decided to get rid of players that played less than 12 minutes per game, as I felt classifying players based on how they play when they barely play was not gonna provide accurate results.

```
#Remove players with at less than 12min per game
df=df[df.MINUTES > 12]
df = df.reset_index(drop=True)
```

That leaves us with a total of 412 players.

In this article, we will understand the basics of K Mean Clustering and implement it in Python using the famous Machine Learning library, Scikit-learn

K-Means Clustering Algorithm | K-Means Clustering With Python will help you to comprehensively learn all the concepts of the k-means algorithm in machine learning. K-means Clustering is one of the most common data analysis technique used to get an intuition about the structure of the data. It has various applications such as, Identifying Fake news, Filtering spam mails & Customer Segmentation.

Centroid Initialization Methods for K-Means Clustering - This article is the first in a series of articles looking at the different aspects of k-means clustering, beginning with a discussion on centroid initialization.

A cluster is a group of objects which have similar properties and belong to the same class. In clustering, we first partition the set of data into groups based on the similarity and then assign the labels to those groups.

In this step-by-step tutorial, you'll learn how to perform k-means clustering in Python. You'll review evaluation metrics for choosing an appropriate number of clusters and build an end-to-end k-means clustering pipeline in scikit-learn.