The new era of machine learning and artificial intelligence is the Deep learning era. It not only has immeasurable accuracy but also a huge hunger for data. Employing neural nets, functions with more exceeding complexity can be mapped on given data points.

But there are a few very precise things which make the experience with neural networks more incredible and perceiving.

Xavier Initialization

Let us assume that we have trained a huge neural network. For simplicity, the constant term is zero and the activation function is identity.

For the given condition, we can have the following equations of gradient descent and expression of the target variable in terms of weights of all layer and input a[0].

For ease of understanding, let us consider all weights to be equal, i.e.

Here we have considered the last weight different as it will give the output value and in case of binary classification it may be a sigmoid function or ReLu function.

When we replace all the weights in the expression of the target variable, we obtain a new expression for y, the expression of prediction of the target variable.

Let us consider two different situations for the weights.

Image for post

#ai #gradient-descent #machine-learning #neural-networks #deep-learning #deep learning

New Ways for Optimizing Gradient Descent
1.15 GEEK