The Exploding and Vanishing Gradients Problem in Time Series

In this post, we deal with exploding and Vanishing Gradient in Time Series and in particular in Recurrent Neural Network (RNN) by Truncated BackPropagation Through Time and Gradient Clipping.


In this post, we focus on deep learning for sequential data techniques. All of us familiar with this kind of data. For example, the text is a sequence of words, video is a sequence of images. More challenging examples are from the branch of time series data, with medical information such as heart rate, blood pressure, etc., or finance, with stock price information. The most common _AI _approaches for time-series tasks with deep learning is the Recurrent Neural Networks (RNNs). The motivation to use RNN lies in the generalization of the solution with respect to time. As sequences have different lengths (mostly), a classical deep learning architecture such as Multy Layers Perceptrons (MLP) can not be applied without modifying it. Moreover, the number of weights in MLP is absolutely huge! Hence, The RNN is commonly used, where the weights are shared during the entire architecture. A simple RNN architecture is shown below, where V, W, and U are the weights matrices, and b is the bias vector.

