As I wrote in the last article of this series, focal loss is a more focused cross entropy loss. In semantic segmentation problems, focal loss can help the model focus on pixels that have not been well trained yet, which is more effective and purposeful than cross entropy loss. I recommend the article if you haven ‘t read it yet.
Again in this article, I ‘d like to talk about a variant of focal loss that can be used as a distance-aware cross entropy loss in semantic segmentation problems, especially with sparse labels.
Cross entropy loss is typically used in semantic segmentation, in which each pixel of the image is labeled with a category number, and a model is trained to predict that number for each pixel.
This per-pixel prediction fashion is widely used in almost all academic papers about semantic segmentation. However, there exists an issue, which may cause bad influence on convergence and is seldomly discussed. The sparser the labels are, the more significant the issue will be.
Let ‘s take a binary classification scenario as an example. Extension to multi-class classification is straightforward. Look at the following Figure 1, in which a blue pixel is labeled with 1, and all other pixels are labeled with 0. This labeled image is extremely sparse and foreground-background imbalanced because only one pixel is labeled as foreground class.
Figure 1: a labeled image for binary classification
Now, imaging Figure 1 is input to our model, and we get an output image in which the label of every pixel is predicted. As conceptually shown in Figure 2, among all the resulting pixels, we take two of them for discussion. The two pixels are colored in green, and the labeled blue pixel is also preserved here for ease of explanation.
#optimization #machine-learning #deep-learning #convolutional-network #computer-vision