Fine-tune a non-English GPT-2 Model with Huggingface

Fine-tune non-English, German GPT-2 model with Huggingface on German recipes. Using their Trainer class and Pipeline objects

Image for post

Originally published at https://www.philschmid.de on September 6, 2020.

introduction

Unless you’re living under a rock, you probably have heard about OpenAI’s GPT-3 language model. You might also have seen all the crazy demos, where the model writes JSX, HTML code, or its capabilities in the area of zero-shot / few-shot learning. Simon O’Regan wrote an article with excellent demos and projects built on top of GPT-3.

A Downside of GPT-3 is its 175 billion parameters, which results in a model size of around 350GB. For comparison, the biggest implementation of the GPT-2 iteration has 1,5 billion parameters. This is less than 1/116 in size.

In fact, with close to 175B trainable parameters, GPT-3 is much bigger in terms of size in comparison to any other model else out there. Here is a comparison of the number of parameters of recent popular NLP models, GPT-3 clearly stands out.

Image for post

created by author

This is all magnificent, but you do not need 175 billion parameters to get good results in text-generation.

There are already tutorials on how to fine-tune GPT-2. But a lot of them are obsolete or outdated. In this tutorial, we are going to use the transformers library by Huggingface in their newest version (3.1.0). We will use the new Trainer class and fine-tune our GPT-2 Model with German recipes from chefkoch.de.

You can find everything we are doing in this colab notebook.

#ai #data-science #editors-pick #nlp #machine-learning

Fine-tune non-English, German GPT-2 model with Huggingface on German recipes. Using their Trainer class and Pipeline objects

introduction

towardsdatascience.com

Fine-tune a non-English GPT-2 Model with Huggingface