Teaching BART to Rap: Fine-tuning Hugging Face’s BART Model

Transfer learning has provided an unimaginable boon to artificial intelligence over the past few years, making waves in the computer vision space and more recently in the NLP space with researchers discovering that a model trained on a language modelling task can easily (quickly and cheaply) be adapted for other tasks. From a practitioner’s perspective, aside from the deluge of new discoveries — easily accessible on Arvix — Hugging Face have developed unbelievably easy-to-use APIs that allow anyone to access these latest developments with a few lines of code.

In spite of the ease with which one can use the Hugging Face APIs both for on-the-fly inference and for fine tuning through command line style arguments, I got a little stuck trying to fine-tune the BART model. I’m aiming to use it for my Masters thesis and it took me an inordinate amount of time to write the code to fine tune the model because I got stuck in the process. However, once I’d managed to get past this, I’ve been amazed at the power of this model.

TL; DR: Check out the fine tuning code here and the noising code here.

This article will give a brief overview of how to fine-tune the BART model, with code rather liberally borrowed from Hugging Face’s finetuning.py script. However, this will allow a bit more control over how one can experiment with the model. I’ve used PyTorch Lightning to handle the training, and if you’re new to it, I encourage you to get familiar with it. The implementation is incredibly straightforward and may be able to streamline some of your projects going forward. Although I’ve taught BART to rap here, it’s really just a convenient (and fun!) seq2seq example as to how one can fine-tune the model.

Just a quick overview of where I got stuck in the training process. The loss on my model was declining at a rapid pace over each batch, however the model was learning to generate blank sentences. For a long time, I couldn’t figure out why this was happening. It turns out that you need to manually shift the tokens to the right before you feed them to the decoder, but that you must pass the unshifted tokens to the loss function.

So, without further ado, this is how to teach BART to rap.

#machine-learning #artificial-intelligence #nlp

towardsdatascience.com

Teaching BART to Rap: Fine-tuning Hugging Face’s BART Model