Find the Number of Clusters in KMeans. Silhouette Score. Python Code Example

Silhouette score is the metric that can find the optimal number of clusters in your data by using KMeans algorithm for clustering. Quick remind - Kmeans is an unsupervised learning in the area of Machine learning.

With this video I explain and demonstrate how Silhouette score and curve works in real data. If you are working with clustering algorithms, probably you know the situation when you are not sure how many cluster to use is the best for your data science project if using classical Elbow method. Silhouette score is a good replacement for this.

The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The silhouette ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. In simple words, the silhouette score technique predict the number of clusters that corresponds your data the best.

Silhouette technique in Kmeans also provide Silhouette diagram which let you quickly take a look to statistical distribution of data clusters in the different scenarios of number of clusters.

Content of the video:
0:05 - Introduction to Silqouette score.
1:16 - Coding part in Python (Kmeans - Elbow method vs. Silhouette score, Silhouette curve and values).
4:37 - Silhouette diagrams.

In Python, scikit-learn provides the core of Silhouette score functionality on your hands.


#python #kmeans

Find the Number of Clusters in KMeans. Silhouette Score. Python Code Example