For the life of me, I couldn’t understand how BERT or GPT-2 worked.

I read articles; followed diagrams; squinted at equations; watched recorded classes; read code documentation; and still struggled to make sense of it all.

It wasn’t the math that made it hard.

More like, that the big part you’d expect to precede the nitty-gritty was somehow missing.

This article bridges the gap, explaining in simple terms how these models are built. It’s the article I wish I could have read first; many of the details would have then slotted right into place.

With the generous help of ABBA, we’ll introduce three different ideas. I guarantee there won’t be a single mathematical equation in any of them:

1. Thinking about words and meaning — Attention

2. Pancake stacking small components — Deep Learning

3. Thinking about words and signals — Embeddings

In the fourth and final section, we’ll see how these ideas tie neatly into a bow.

1. Thinking about words and meaning — Attention

Let’s look at the following sentence:

Image for post

Suppose we asked three friends to read this sentence. While they’d probably agree the sentence’s topic is the rain, they might diverge on which words are most important to ‘rain’ to convey this sentence’s true meaning.

Benny might say ‘tapped’ and ‘window’ were the most important, because the sentence is about the noise the rain makes.

Frida might say ‘softly’ and ‘summer’ are the most important, because this is a sentence about what summer rain is like.

Bjorn might take a different approach altogether, and focus on ‘ed’ and ‘that’ to suggest this sentence is about the memory of past rain.

While all three readers are taking in the same sentence, each is paying attention to different parts. Each is attributing different weighing to some words in relation to ‘rain’, while discounting the importance of others.

Their different ways of understanding this sentence also set their expectations of what comes next. When asked to continue the following sentence starting with: “It …”,

#bert #nlp #gpt-2 #editors-pick #machine-learning

The ABBA explainer to BERT and GPT-2
1.35 GEEK