K-means and Kohonen SOM are two of the most widely applied data clustering algorithms.
Although K-means is a simple vector quantization method and Kohonen SOM is a neural network model, they’re remarkably similar.
In this post, I’ll try to explain, in as plain a language as I can, how each of these unsupervised models works.
K-means clustering was introduced to us back in the late 1960s. The goal of the algorithm is to find and group similar data objects into a number (K) of clusters.
By ‘similar’ we mean data points that are both close to each other (in the Euclidean sense) and close to the same cluster center.
The centroids in these clusters move after each iteration during training: for each cluster, the algorithm calculates the weighted average (mean) of all its data points and that becomes the new centroid.
K (the number of clusters) is a tunable hyperparameter. This means it’s not learned and we must set it manually.
This is how K-means is trained:
K-means is easier to implement and faster than most other clustering algorithms, but it has some major flaws. Here are a few of them:
_Outliers, like the one shown here, can really mess up the K-means algorithm. _
The Kohonen SOM is an unsupervised neural network commonly used for high-dimensional data clustering.
Although it’s a deep learning model, its architecture, unlike that of most advanced neural nets, is fairly straightforward. It only has three layers.
SOMs’ distinct property is that they can map high-dimensional input vectors onto spaces with fewer dimensions and preserve datasets’ original topology while doing so.
1. We initialize weight vectors values randomly.
2. Each neuron computes its respective value of a discriminant function, which is typically the squared Euclidean distance between the neuron’s weight vector and the input vector, for each input pattern. The unit whose weight vector values are closest to those of the input is declared the winning node (the best matching unit).
#machine learning #k-means #som #algorithms