Did you ever use a clustering method before? What was the most difficult part for you? Usually, I do clustering with these steps: scaling the input features, dimensionality reduction, and choosing one clustering algorithm that could perform well on the data. These steps are pretty standard, right? But, the problem lies ahead: understanding the clustering result.

Understanding or interpreting the clustering result usually takes time. We do some statistical analysis and visualisations to compare the clusters. If we change the dimensionality reduction or clustering method, the clusters will change and we need to redo the analysis. Interpreting clustering result becomes the bottleneck that hinders us from quickly iterating the whole process.

**My initial interpretation of the clustering result is as simple as calling a function ****cluster_report(features, clustering_result)**. In the following section, I will give an example of clustering and the result of cluster_report. If you want to skip the example, you can scroll to the bottom of this article to get the code and Google Collab notebook.

Example: Clustering Wine

Let’s use Scikit’s wine dataset as our example. This dataset has 13 numeric features and a label which indicate the type of wine. Below are the samples of the data.

label	alcohol	malic_acid	ash	alcalinity_of_ash	magnesium	total_phenols	flavanoids	nonflavanoid_phenols	proanthocyanins	color_intensity	hue	od280/od315_of_diluted_wines	proline
	0	13.74	1.67	2.25	16.4	118.0	2.6	2.9	0.21	1.62	5.85	0.92	3.2	1060.0
	2	12.79	2.67	2.48	22.0	112.0	1.48	1.36	0.24	1.26	10.8	0.48	1.47	480.0
	1	12.37	1.13	2.16	19.0	87.0	3.5	3.1	0.19	1.87	4.45	1.22	2.87	420.0
	0	13.56	1.73	2.46	20.5	116.0	2.96	2.78	0.2	2.45	6.25	0.98	3.03	1120.0
	1	13.05	5.8	2.13	21.5	86.0	2.62	2.65	0.3	2.01	2.6	0.73	3.1	380.0
view raw
wine_sample_5.csv hosted with ❤ by GitHub

First, we need to standardise the data to prevent the clustering dominated by features with bigger scale. In this case, we use zero mean and unit variance standardisation. After that, we use PCA (Principal Component Analysis) to reduce the dimensions from 13 features to 2 features/principal-components.

We use KMeans clustering for this example because most of us know about it. To determine the number of clusters for KMeans clustering, we use the elbow method and got k=3 as the optimal one.

Using KMeans with k=3 on the two principal components, we got the clustering result below. The left scatters plot is showing the original label. The right scatters plot is showing the clustering result.

After having the clustering result, we need to interpret the clusters. The easiest way to describe clusters is by using a set of rules. We could automatically generate the rules by training a decision tree model using original features and clustering result as the label. I wrote a cluster_report function that wraps the decision tree training and rules extraction from the tree.** You could simply call **cluster_report** to describe the clusters**. Easy, right?

There are two parameters that we can adjust: min_samples_leaf and pruning_level. Those parameters are controlling the decision tree complexity. To get a more general rule, we could increase the value of min_samples_leaf or pruning_level. Otherwise, if we want to get a more detail rule, we could decrease the value of min_samples_leaf or pruning_level.

The number in the bracket is showing the proportion of class_name satisfying the rule. For example, **[0.880]** (proline > 755.0) means for all instances that satisfy (proline > 775.0) rule, 88% of them are in cluster 1.

#clustering #data-science #data-analysis #programming #data analysis

The Easiest Way to Interpret Clustering Result
9.80 GEEK