Optimal clusters for KMeans Algorithm

Optimal clusters for KMeans Algorithm. The K-Means algorithm needs no introduction. It is simple and perhaps the most commonly used algorithm for clustering.

Before we get into details of finding out optimal clusters, let's first see what the KMeans clustering algorithm is and some basics about it.

What is Clustering?

Clustering is an unsupervised ML technique wherein we cluster the data to get insights from it. Clustering the data is quite essential for some business models and problems. It gives us conclusions on what is a cluster, i.e. data which is similar and in the form of cluster or groups.

Clustering is the process of dividing the entire data into groups (also known as clusters) based on the patterns in the data.

What is the KMeans clustering algorithm?

It is an algorithm for clustering. We will be discussing this method with code in the further sections.

Initial Imports :

``````import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
%matplotlib inline``````

Method :

Now let's discuss the method behind finding out the right number of clusters on a K-Means clustering algorithm. So we will learn how to decide what number of clusters to input into your K-Means algorithm. Here we've got a data science problem. We've got only two variables, x and y coordinates.

Now, if we run the K means clustering algorithm on this dataset with three clusters or with K pre-determine the clusters to be three, then the result will look something like this.

