Sequence-to-Sequence Models: From RNN to Transformers

Sequence-to-sequence models are fundamental Deep Learning techniques that operate on sequence data. It converts sequence from one domain to sequence in another domain [1]. These models can be RNN-based simple encoder-decoder network or the advanced attention-based encoder-decoder RNN or the state-of-the-art transformer models. There are many applications of sequence-to-sequence models such as — machine translation, speech recognition, text summarization, question answering, demand forecasting, and so on.

This article is part — 1 of a three-part article on sequence-to-sequence models where we focus on building a machine translation system. In this part, we will focus on the inner workings on an RNN based encoder-decoder network. To illustrate, we will build a Spanish to English translation model.

The focus of this article is on model architecture, training, and the inference process using Tensorflow 2.0. Therefore, we will leave out the discussion on data preparation. As a reference, you can follow the Tensorflow tutorial [2] for the data preparation part. If you are new to Tensorflow 2.0, you may want to pay special attention to the create a tf.data dataset section [2]

#nlp #machine-learning #deep-learning #artificial-intelligence

towardsdatascience.com

Sequence-to-Sequence Models: From RNN to Transformers