Activation functions are crucial in Neural Networks, as they signify which neurons/kernels to activate in ANNs and CNNs respectively. To know more about where they fit into Neural networks, read my previous article.

In this article, I want to focus on how different activation functions do the selection process and how you should select an appropriate one for your project.

In Neural Networks, there are a lot of neurons (or kernels in CNN) in every layer, each of which considers a different weighted sum of inputs, so that each input attribute is considered.

Let us consider an example where our network is predicting whether a customer will buy a product based on the person’s salary, age, and category of product. One neuron gives a higher weight to salary while the other gives more weight to his age. Each one has it’s own weightage sum for the given inputs.

This is where **activation functions** come into the picture. Activation functions use different criteria to select these and the criterion depends upon the type of activation function. They can be different for different layers in the neural network.

Activation functions can be categorized into 3 types: Linear, Non-linear and Step functions.

Linear Activation from GeeksForGeeks

Linear activation function represents the outputs as a linear combination of input (y = a*X, where X is input, y is output and a is some linear multiplier). Using this, each layer can be represented as a linear multiple of another (y1 = a*X, y2 = b*y1, and so on). So what happens is, the actual output becomes directly related to input (y2 = a*b*X). Now if we consider another multiplier that is a combination of both the previous ones (c = a*b), the output can be directly given as multiple of input (y = c*X), making the intermediate layers (y1 and y2) meaningless. This is like having a network with no hidden layers.

So Linear Activations are a no go for Neural Networks.

If you are still not convinced about not using these, let me give another reason, which will definitely change your mind… This is a bit of a mathematical explanation, so feel free to skip the next paragraph, if you’re convinced already…

We know that Neural networks are trained during backpropagation and for this, gradient descent is used. Gradient descent calculates the slope of the tangent to find the direction in which it needs to jump (towards minima), and mathematically, the slope of the tangent is calculated by taking the derivative of the function, and here lies the problem. The derivative of a constant (in this case, the linear multiplier) is zero, which tells gradient descent algorithm that there is no gradient at all, and so, no minima. This makes backpropagation impossible, as there no minima to converge towards.

#ai #deep-learning #deep learning

1.45 GEEK