The essence of the K-means clustering algorithm is that it
This roundness truncates the training set. Moreover, k-means requires that the shape of these clusters must be circular. Therefore, the clusters (circles) fitted by the k-means model are very different from the actual data distribution (maybe ellipses)
As result multiple circular clusters are mixed together and overlap each other
In general, k-means has two shortcomings, which makes it unsatisfactory for many data sets (especially low-dimensional data sets):
Well, if you are completely unaware of Clustering, it’s a need, types, and applications then I recommend you go through this article first.
Let’s try to understand what is the “Gaussian” in Gaussian Mixture Model
The gaussian distribution also known as the Normal **distribution **is a very important probability distribution in the various fields and has a significant influence on many aspects of statistics.
If the random variable X follows a Gaussian distribution with mathematical expectation μ and standard deviation σ2, it is written as
Then its probability density function is
Everything that goes up, comes down, according to Gauss.
When the sample data X is one-dimensional data (Univariate), the Gaussian distribution follows the following probability density function (PDF)
Wherein **μ **the data mean (desired), **σ **a data standard deviation.
When the sample data X is Multivariate, the Gaussian distribution follows the probability density function below:
Among them, **μ **is the data mean (expected), **σ **is the Covariance, D is the data dimension.
The Gaussian mixture model can be regarded as a model composed of K single Gaussian models. These K submodels are the hidden variables of the hybrid model
The Gaussian mixture model (GMM) can be regarded as an optimization of the k-means model. It is not only a commonly used in industry but also a generative model.
The Gaussian mixture model attempts to find a mixed representation of the probability distribution of the multidimensional Gaussian model, thereby fitting a data distribution of arbitrary shape.
#gaussian-mixture-model #data-science #clustering #data analysis