The Simple, Quick, and Popular Clustering Method

What is Clustering?

Clustering is the idea of separating the data into groups, or better yet, clusters. When dealing with an unsupervised learning problem where the dataset is unlabeled, we seemingly have no means of splitting the data into classes, however, that isn’t entirely true. With clustering, we are able to separate the data into groups utilizing an array of different methods, but more generally, simply separating the data based on proximity between each other. This can be seen below:

Image for post

An example of clustering, simplified illustration

As you can see, the dataset can be clearly separated into groups (of course this is a simplified example, datasets will appear much more random and complicated in actuality). We are able to separate the data into groups based on proximity. If you are used to dealing with supervised learning, take this thought process:

Just as we use DecisionTree algorithms to separate the dataset into distinctive classes, and is able to do so, solely based on training on the labeled data. Clustering provides the same service, but rather than sorting the data into distinctive classes based on labeled data, we are able to do so based entirely on the proximity of the data, of a specified threshold distance.

#artificial-intelligence #data-science #machine-learning #developer

KMeans Clustering Algorithm
1.80 GEEK