In the first part of this tutorial series, we talked about language models in general; the statistical nature of language, and how common statistical models can not capture the necessary information in language to reliably process it.
Some basic knowledge from part 1 is necessary in order to go through this session, so I’d recommend checking it out if you need the primer.
These pieces of knowledge will be primary building blocks we rely on in this article.
In this session, we’ll be moving from the transformer model for sequence-to-sequence, to a word-predictive model (GPT-2).
What are word-predictive models? In the part one, we also discussed how to build a model that predicts the next word. But we approached the idea from the neural machine translation (NMT) perspective.
The best way to illustrate a word-predictive model is through the use of your smart mobile keyboard, which suggests the next word to type.
#gpt-2 #nlp #transformers #machine-learning #heartbeat #deep learning