With convolution neural networks, we can create deep learning models which are really good at classifying images and can be scaled to process a wide variety of image data. When processing sequence data, such as text data, time series data or audio data where we are looking information which spans through time and forms a sequence of information. CNN’s may not be a good choice.

With the case of sequential data, where next set of data can be dependent on the previous set it is necessary to have some sort of history to be maintained, which can be leveraged for predicting the outcome. For example, if we are looking at predicting the next word in a sequence, it is necessary to be able to relate the previous words and make sure that everything fits.

For this kind of data, we need specialised networks, which can keep a history and use it to predict the next data point. But why can’t we use feed forward networks to model this requirement? The problem in using a feed forward network for such kind of data is

  • Feed forward networks require a fixed length of data input which is set beforehand. With sequence data, it is not necessary that the input will be of fixed length.
  • Long term dependencies, in other words, historical state cannot be maintained.

In sequence modelling, the network should be able to -

  • Handle variable sized inputs, such as variable length sentences.
  • Track long term dependencies. In other words, maintain a history.
  • Preserve order. In language modelling it is necessary that the order of the sentences is maintained so that the appropriate prediction can be made.
  • Parameter sharing across sequence.

In these cases, we turn to Recurrent Neural Networks. Hence, RNNs are a type of neural network which remembers the previous output and uses it in the calculation of the current output. They have a sort of feedback loop, which allows the information to persist in the network and is used in the processing of the next input sequence. Below is an example of a RNN.

Image for post

The following steps describe how a RNN works in simple terms -

  1. Take the input x
  2. Predict the output y
  3. Update the internal state h and use h in the next time step.

In the update of the cell state h, the previous state h(t-1) is also used. The calculation is a matrix multiplication with weight matrix for the current state and the previous state with the cell state from the previous time step and the input vector x. Then a activation function is applied to introduce non-linearity. If the above image is unrolled, it will look like this -

#deep-learning #data-science #recurrent-neural-network #machine-learning #sequence-model #deep learning

Artificial Neural Networks — Recurrent Neural Networks
1.40 GEEK