— Exploding gradients. On the other hand, the Exploding gradients problem refers to a large increase in the norm of the gradient during training.

**By the end of this article you’ll know:**

- What is Gradient Clipping and how does it occur?
- Types of Clipping techniques
- How to implement it in Tensorflow and Pytorch
- Additional research for you to read

Gradient Clipping solves one of the biggest problems that we have while calculating gradients in Backpropagation for a Neural Network.

You see, in a backward pass we calculate gradients of all weights and biases in order to converge our cost function. **These gradients, and the way they are calculated, are the secret behind the success of Artificial Neural Networks in every domain**.

But every good thing comes with some sort of caveats.

Gradients tend to encapsulate information they collect from the data, which also includes long-range dependencies in large text, or multidimensional data. So, while calculating complex data, things can go south really quickly, and you’ll blow your next million-dollar model in the process.

Luckily, you can solve it before it occurs (with gradient clipping) – let’s first look at the problem in-depth.

**The **[**Backpropagation**](https://discuss.pytorch.org/t/proper-way-to-do-gradient-clipping/191/16)** algorithm is the heart of all modern-day Machine Learning applications**, and it’s ingrained more deeply than you think.

Backpropagation calculates the gradients of the cost function w.r.t – the weights and biases in the network.

It tells you about all the changes you need to make to your weights to minimize the cost function (it’s actually -1*∇ to see the steepest decrease, and +∇ would give you the steepest increase in the cost function).

Pretty cool, because now you get to adjust all the weights and biases according to the training data you have. Neat math, all good.

What about deeper networks, like Deep Recurrent Networks?

The translation of the effect of a change in cost function(C) to the weight in an initial layer, or the norm of the gradient, becomes so small due to increased model complexity with more hidden units, that it becomes zero after a certain point.

This is what we call *Vanishing Gradients*.

This hampers the learning of the model. The weights can no longer contribute to the reduction in cost function(C), and go unchanged affecting the network in the Forward Pass, eventually stalling the model.

On the other hand, the *Exploding gradients* problem refers to a **large increase in the norm of the gradient during training**.

Such events are caused by an explosion of long-term components, which can grow exponentially more than short-term ones.

This results in an unstable network that at best cannot learn from the training data, making the gradient descent step impossible to execute.

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

What is neuron analysis of a machine? Learn machine learning by designing Robotics algorithm. Click here for best machine learning course models with AI

AI, Machine learning, as its title defines, is involved as a process to make the machine operate a task automatically to know more join CETPA

Looking to attend an AI event or two this year? Below ... Here are the top 22 machine learning conferences in 2020: ... Start Date: June 10th, 2020 ... Join more than 400 other data-heads in 2020 and propel your career forward. ... They feature 30+ data science sessions crafted to bring specialists in different ...

Artificial Intelligence (AI) will and is currently taking over an important role in our lives — not necessarily through intelligent robots.