I love good data visualizations. Back in the days when I did my PhD in particle physics, I was stunned by the histograms my colleagues built and how much information was accumulated in one single plot.
It is really challenging to improve existing visualization methods or to transport methods from other research fields. You have to think about the dimensions in your plot and the ways to add more of them. A good example is the path from a boxplot to a violinplot to a swarmplot. It is a continuous process of adding dimensions and thus information.
The possibilities of adding information or dimensions to a plot are almost endless. Categories can be added with different marker shapes, color maps like in a heat map can serve as another dimension and the size of a marker can give insight to further parameters.
When it comes to machine learning, there are many ways to plot the performance of a classifier. There is an overwhelming amount of metrics to compare different estimators like accuracy, precision, recall or the helpful MMC.
All of the common classification metrics are calculated from true positive, true negative, _false positive _and _false negative _incidents. The most popular plots are definitely ROC curve, PRC, CAP curve and the confusion matrix.
I won’t get into detail of the three curves, but there are many different ways to handle the confusion matrix, like adding a heat map.
A seaborn heatmap of a confusion matrix.
#matplotlib #machine-learning #classification #python #plot