In the previous episodes, we’ve discussed a few clustering strategies like centroid based and hierarchy based (you can find them here, if interested); in this episode, we talk of yet another family of clustering algorithms. Density based clustering algorithms are ones that proceed by finding the areas with a higher concentration of data points and merge those with similar concentration into a single cluster. Now, I know this might seem a little vague, like what’s concentration? How can it be similar? And so on. So without further adieu, let’s dive into the matter.

The intuition

One of the best ways to tackle a theoretical formulation is to ignite the flames of intuition. To get the train on the tracks, we need to grasp a few concepts like,

Radius based neighborhood: Say you live in a locality and consider every house less than four doors away to be your neighbors. Now looking closely, you find your neighborhood forms a circle of three houses. Likewise for a data point in space, if a draw a circle with the point in the center and a radius of ɛ, it forms an ɛ-neighborhood. For 3d space, a sphere forms, and in higher dimensions, N-spheres form.

**Density: **Back in the grad school days, we’ve density to be the measure of compactness, mathematically expressed as mass over volume. The notion hasn’t been altered that much. Say you got a point in 2d space and have drawn a circle of radius = 3 units. The volume would be (3.1415x3²)=28.27, (I know that’s the area, but this can also be seen as the projection of volume, like when you squash a 3d sphere to a heightless circle). The mass here would be the number of data points that fall within this volume, say for this point, a total of 35 points fall within the radius, the density would then be 35/28.27 or 1.24. The number 1.24 or the density all by itself is meaningless, but if we assign a density value to each point of the dataset, we can then arrange them in different clusters by the virtue of similar density measures.

While none of the algorithms work the exact same way as stated above, this should at least give a rough overview of the framework. Here’s a good read about that.

#machine-learning #unsupervised-learning #clustering #artificial-intelligence

Visualizing the Density Based Clustering Algorithms
1.05 GEEK