Once upon a time, I was trying to train a speaker recognition model with TIMIT dataset. I used Alexnet since I wanted to try this with a smaller model first. I have used a softmax layer at the end. The inputs were spectrograms of voices of different people and labels were the speaker IDs. MSELoss was used with the PyTorch library. I left the model for hours to train but to no avail. I was wondering why.

Image for post

I checked the output from the model (the output from the softmax). The elements of the output array were all equal to each other, for all the inputs I tried. This was really annoying. It seemed that the model did not learn anything at all. So I set out to investigate. This article contains some of my findings about the softmax function. First let’s examine the softmax function

Image for post

The equation above shows the softmax function of a vector x. As we can see softmax function contains exponential terms. The result of the exponential function can get very large with increasing input. Therefore for sufficiently large inputs, overflow errors can occur! So we need to make sure that the input does not get too large to cause this. Here by input I mean the input to the softmax function. So I tried to find when the exponential term gives an overflow error. The largest value without overflow was found to be 709 (at least in my machine).

value=709
sm=np.exp(value)

Note that this value could change from machine to machine and from the library to library.

Next I set out to explore how softmax behaves for large and small inputs. So I created input arrays sampled from normal distributions with varying mean values. And then plotted the statistics after taking the the softmax function. The size of the input was chosen to be 1000.

The code I used to do this is as follows (in python)

means_list=[]
max_list=[]
sd_list=[]
x_axis=[]
sm_list=[]
for mean in range(0,40000):
    mean=mean/100
    sd=mean/10
    feature=normal(mean,sd,1000)
    sm=np.exp(feature)/np.sum(np.exp(feature)) 
    sm_list.append(sm)
    means_list.append(sm.mean())
    max_list.append(sm.max())
    sd_list.append(sm.std())
    x_axis.append(mean)

The following figure shows the mean values of the softmax function plotted against mean value of the input feature. As expected, it is constant. Well that is good so far.

#softmax #loss-function #neural-networks #convolutional-network #overflow #function

Are You Messing With Me Softmax?
2.30 GEEK