What is K-Means clustering and how to use it? K-Means is an unsupervised Machined Learning algorithm. Unsupervised algorithms make inferences from datasets using only input vectors without referring to known, or labelled outcomes.

K-Means is probably the most well-known clustering algorithm. It’s taught in a lot of introductory data science and machine learning classes.

Clustering has been used in statistics to group targets together based on a common set of features and separate them using dissimilar features. The similarity is a metric that reflects the strength of the relationship between two data objects

K-Means is an unsupervised Machined Learning algorithm. Unsupervised algorithms make inferences from datasets using only input vectors without referring to known, or labelled outcomes.

- Taking any L centroids or data points (L is set by you can be 2 or any number) in its account initially.
- After choosing the centroids, (say C1 and C2) the data points (coordinates here) are assigned to any of the Clusters (let’s take centroids = clusters for the time being) depending upon the distance between them and the centroids.
- Assume that the algorithm chose OB-1 (1,1,1) and OB-2 (2,2,2) as centroids .
- For measuring the distances, you take the following distance measurement function (also termed as similarity measurement function):

d=|x2–x1|+|y2–y1|+|z2–z1|d=|x2–x1|+|y2–y1|+|z2–z1|

This is also known as the **Taxicab distance** or **Manhattan distance**, where d is distance measurement between two objects, OB1(x1,y1,z1) and OB2 (x2,y2,z2) are the X, Y and Z coordinates of any two objects taken for distance measurement.

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.