I’ ve been itching to try the T5 (Text-To-Text Transfer Transformer) ever since it came out way, way back in October 2019 (it’s been a long couple of months). I messed around with open-sourced code from Google a couple of times, but I never managed to get it to work properly. Some of it went a little over my head (Tensorflow 😫 ) so I figured I’ll wait for Hugging Face to ride to the rescue! As always, the Transformers implementation is much easier to work with and I adapted it for use with Simple Transformers.

Before we get to the good stuff, a quick word on what the T5 model is and why it’s so exciting. According to the article on T5 in the Google AI Blog, the model is a result of a large-scale study (paper link) on transfer learning techniques to see which works best. The T5 model was pre-trained on C4 (Colossal Clean Crawled Corpus), a new, absolutely massive dataset, released along with the model.

Pre-training is the first step of transfer learning in which a model is trained on a self-supervised task on huge amounts of unlabeled text data. After this, the model is fine-tuned (trained) on smaller labelled datasets tailored to specific tasks, yielding far superior performance compared to simply training on the small, labelled datasets without pre-training. Further information on pre-training language models can be found in my post below.

Understanding ELECTRA and Training an ELECTRA Language Model

How does a Transformer Model learn a language? What’s new in ELECTRA? How do you train your own language model on a…

towardsdatascience.com

A key difference in the T5 model is that all NLP tasks are presented in a text-to-text format. On the other hand, BERT-like models take a text sequence as an input and output a single class label or a span of text from the input. A BERT model is retrofitted for a particular task by adding a relevant output layer on top of the transformer model. For example, a simple linear classification layer is added for classification tasks. T5, however, eschews this approach and instead reframes any NLP task such that both the input and the output are text sequences. This means that the same T5 model can be used for any NLP task, without any aftermarket changes to the architecture. The task to be performed can be specified via a simple prefix (again a text sequence) prepended to the input as demonstrated below.

Image for post

The T5 paper explores many of the recent developments in NLP transfer learning. It is well worth a read!

However, the focus of this article on adapting the T5 model to perform new NLP tasks. Thanks to the unified text-to-text approach, this turns out to be (surprisingly) easy. So, let’s get to the aforementioned good stuff!

The Task

The T5 model is trained on a wide variety of NLP tasks including text classification, question answering, machine translation, and abstractive summarization. The task we will be teaching our T5 model is question generation.

Specifically, the model will be tasked with _asking relevant questions _when given a context.

You can find all the scripts used in this guide in the examples directory of the Simple Transformers repo.

The Dataset

We will be using the Amazon Review Data (2018) dataset which contains (among other things) descriptions of the various products on Amazon and question-answer pairs related to those products.

The descriptions and the question-answer pairs must be downloaded separately. You can either download the data manually by following the instructions in the _Descriptions _and _Question-Answer Pairs _below, or you can use the provided shell script. The list of categories used in this study is given below.

#nlp #data-science #data analysis

Understanding ELECTRA and Training an ELECTRA Language Model

The Task

The Dataset

towardsdatascience.com

Training a T5 Transformer Model on a New task