Build a K-Means Clustering Algorithm from Scratch in Python

In this project, we'll build a k-means clustering algorithm from scratch. Clustering is an unsupervised machine learning technique that can find patterns in your data. K-means is one of the most popular forms of clustering.

Project Steps
- Write out pseudocode for the algorithm
- Code the k-means algorithm
- Plot the clusters from the algorithm
- Compare performance to the scikit-learn algorithm

Chapters

00:00 Intro
00:37 k-means overview
02:51 Loading in and cleaning FIFA data
06:11 Scaling the data
10:31 Initialize random centroids
14:20 Finding cluster labels for each data point
19:29 Update centroid values
23:30 Plotting k-means iterations
28:24 Pulling the algorithm together
35:25 Comparing our implementation to scikit-learn
37:56 Conclusion and next steps

We'll create our algorithm using python and pandas. We'll then compare it to the reference implementation from scikit-learn.

You can find the full project code here - https://github.com/dataquestio/project-walkthroughs/tree/master/kmeans

You can download the data here - https://www.kaggle.com/datasets/stefanoleone992/fifa-22-complete-player-dataset

Subscribe: https://www.youtube.com/@Dataquestio/featured

#kmeans #python #machinelearning