Pytorch Transformers for Machine Translation

In this tutorial we build a Sequence to Sequence (Seq2Seq) with Transformers in Pytorch and apply it to machine translation on a dataset with German to English sentences, specifically the Multi30k dataset.

#python #machine-learning

What is GEEK

Buddha Community

Pytorch Transformers for Machine Translation
Kennith  Kuhic

Kennith Kuhic


Machine Translation

Machine Translation plays a vital role in today’s digitized and globalized world. It benefits society by processing and translating one natural language into some other natural language. With advances in technology, there is an enormous amount of information being exchanged between different regions with different languages. This boosts the demand for Machine Translation to grow exponentially over the last few decades. Owing to that, Machine Translation has become an active research area over the past couple of years. It can be divided into three distinct approaches: rule-based approaches, statistical approaches, and neural approaches. In this article, I will mainly focus on the statistical and neural approaches.


Machine Translation is the task of translating a sentence from one language (the source language) to a sentence in another language (the target language). It is the sub-field of computational linguistics that aims to utilize computing devices to automatically translate text from one language to another. Machine Translation research began in the early 1950s (Cold War period). During that time, there is a need to translate Russian documents into English. Since there are not many Russian language experts and it is very time consuming to translate, therefore Machine Translation is targeted to be applied. The systems developed at that time were mostly rule-based, using a bilingual dictionary to map Russian words to their English corresponding words. Even though it did not work very well, it gave way to statistical-based systems in the late 1980s. In the 1990s, the statistical word-based and phrase-based approaches that required little to no linguistic information became popular. The core idea of Statistical Machine Translation (SMT) is to learn a probabilistic model from data. In the 2010s, with the advent of deep neural networks, Neural Machine Translation became a major area of research. NMT is a way to do Machine Translation using deep neural networks. The neural architecture is called sequence-to-sequence (seq2seq). The vanilla seq2seq NMT involves two Recurrent Neural Networks (RNN) [1]. NMT research has pioneered many of the recent innovations of natural language processing (NLP) and researchers have found many improvements to the vanilla seq2seq and one major improvement is the use of attention mechanism [2]. Motivated by the attention mechanism, the paper “Attention Is All You Need” introduces a novel architecture called Transformer which is now the state-of-the-art language model [3]. This architecture relies entirely on attention-mechanism without any RNN and its variants such as BERT has been applied to many NLP tasks and are able to achieve state-of-the-art performance.

Statistical Machine Translation

Statistical Machine Translation (SMT) learns a probabilistic model from data. Suppose if we are translating from German to English, we want to find the best English sentence y, given German sentence x. SMT will formulate the task as follow:

It means that among all possible y, we want to find the best one. By using Bayes Rule, we can convert the above formula into below:

P(x|y) is called the translation model which is to model how words and phrases should be translated using parallel data. An example of parallel data is pairs of human translated German-English sentences. P(x|y) is further broken down into P(x, a|y) where a is the word alignment i.e., word-level and phrase-level correspondence between source sentence x and target sentence y.

P(y) is called the language model which is to model the probability of generating strings in a language using monolingual data. A language model is a function that puts a probability measure over strings drawn from a sequence of vocabulary. Given a string y of length n, we can derive the language model probability P(y) as:

However, it is inefficient to calculate the probability of a word given its entire history, we can approximate it by using the n-gram model. In the n-gram model, it makes a Markov assumption that yᵢ will only depend on the preceding n-1 words.

Bigram model example

To compute the argmax, we could enumerate every possible translation y and calculate the probability, however, it is computationally expensive. So, it uses decoding, a heuristic search algorithm, to search for the best translation by removing hypotheses with low probability. This is a brief overview of how SMT works. The best SMT systems were extremely complex, and many important details were not covered here. SMT is expensive and time-consuming to develop because it needs lots of feature engineering and human effort to maintain. [4]

Neural Machine Translation

Deep neural networks have achieved state-of-the-art performance in various applications. Along the line of research on using neural networks for SMT, Neural Machine Translation (NMT) became the major area of research. It uses a single end to end neural network architecture called sequence-to-sequence (seq2seq) which involves two RNNs: an encoder RNN and a decoder RNN. Encoder RNN will summarize the source sequence with an encoding vector, and decoder RNN will generate the target sentence which is conditioned on the previous encoding vector. The seq2seq model is a conditional language model which directly calculates P(y|x) because decoder RNN is predicting the next word of the target sentence y by conditioning on the source sentence x. The seq2seq model can also be used for many other natural language processing tasks such as summarization, dialogue chatbot and so on.

Sequence-to-Sequence (Seq2Seq)

In vanilla seq2seq as illustrated in Figure 2, encoder RNN (blue blocks) analyzes the input sentence in the source language, and it encodes the input sequence with a history vector called hidden state vector. The last hidden state or encoding vector is passed to decoder RNN (red blocks) as the initial hidden state. The decoder initial hidden state together with of the source sentence will generate a hidden state which will then pass to the linear layer.

The linear layer followed by Softmax will output the probabilistic probability distribution for the whole vocabulary of the target language. From that probability distribution, it will choose the token with the highest probability as the first word i.e., X and it will be used as the second input of the decoding. The second hidden state from the previous step and the first generated word X will be input to the second step of decoder RNN. And the same process will be repeated until it produces a token. The sequence of tokens generated from decoder RNN will be the result of the seq2seq model. [1]

The advantages of NMT compared to SMT is that it has better performance and requires much less human effort. However, it is less interpretable, hard to debug and difficult to control.

#machine-learning #transformers #deep-learning #machine-translation #machine translation

Deep Dive in Datasets for Machine translation in NLP Using TensorFlow and PyTorch

With the advancement of machine translation, there is a recent movement towards large-scale empirical techniques that have prompted exceptionally massive enhancements in translation quality. Machine Translation is the technique of consequently changing over one characteristic language into another, saving the importance of the info text.

The ongoing research on Image description presents a considerable challenge in the field of natural language processing and computer vision. To overcome this issue, multimodal machine translation presents data from other methods, for the most part, static pictures, to improve the interpretation quality. In the below example, the model changes one language to another by taking into consideration both text and image.

Here, we will cover the absolute most well-known datasets that are utilized in machine translation. Further, we will execute these datasets with the assistance of TensorFlow and Pytorch Library.

#developers corner #machine translation #machine translation dataset #multi 30k #natural language processing #pytorch #wmt 14 dataset

Example Of Machine Translation In Python And Tensorflow

A practical example of how to build several Machine Translation models in Python and Tensorflow

We will build a deep neural network that functions as part of an end-to-end machine translation pipeline. The completed pipeline will accept English text as input and return the French translation. For our model, we will use an English and French sample of sentences. We will load the following libraries:

import collections

import helper
import numpy as np
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Model
from keras.layers import GRU, Input, Dense, TimeDistributed, Activation, RepeatVector, Bidirectional
from keras.layers.embeddings import Embedding
from keras.optimizers import Adam
from keras.losses import sparse_categorical_crossentropy

#tensorflow #machine-learning #artificial-intelligence #python #machine-translation #machine translation in python and tensorflow

sophia tondon

sophia tondon


5 Latest Technology Trends of Machine Learning for 2021

Check out the 5 latest technologies of machine learning trends to boost business growth in 2021 by considering the best version of digital development tools. It is the right time to accelerate user experience by bringing advancement in their lifestyle.

#machinelearningapps #machinelearningdevelopers #machinelearningexpert #machinelearningexperts #expertmachinelearningservices #topmachinelearningcompanies #machinelearningdevelopmentcompany

Visit Blog-

#machine learning companies #top machine learning companies #machine learning development company #expert machine learning services #machine learning experts #machine learning expert

Virgil  Hagenes

Virgil Hagenes


How to translate text with python


In this tutorial, we will explore different possibilities to translate a text or word using python. From my experience, this is very helpful if you want to automate the translation of many paragraphs, sentences or words.

Furthermore, you can have a backend worker, which receives new data constantly and can either return a request with the translation or store different translations in a database (this is very useful in NLP tasks).

One of the reasons to choose Python apart from the clear syntax and the extensive library is the great community that works extensively on the development of the language itself or extending the functionality with third party modules.

Precisely, one of the modules that makes it straightforward to translate texts is the deep_translator, which provides support for multiple famous translators.

#python #google-translate #translation #translators #translate