# The ABBA explainer to BERT and GPT-2

For the life of me, I couldn’t understand how BERT or GPT-2 worked. I read articles; followed diagrams; squinted at equations; watched recorded classes; read code documentation; and still struggled to make sense of it all. It wasn’t the math that made it hard.

For the life of me, I couldn’t understand how BERT or GPT-2 worked.

I read articles; followed diagrams; squinted at equations; watched recorded classes; read code documentation; and still struggled to make sense of it all.

It wasn’t the math that made it hard.

More like, that the big part you’d expect to precede the nitty-gritty was somehow missing.

This article bridges the gap, explaining in simple terms how these models are built. It’s the article I wish I could have read first; many of the details would have then slotted right into place.

With the generous help of ABBA, we’ll introduce three different ideas. I guarantee there won’t be a single mathematical equation in any of them:

1. Thinking about words and meaning — Attention

2. Pancake stacking small components — Deep Learning

3. Thinking about words and signals — Embeddings

In the fourth and final section, we’ll see how these ideas tie neatly into a bow.

## 1. Thinking about words and meaning — Attention

Let’s look at the following sentence:

Suppose we asked three friends to read this sentence. While they’d probably agree the sentence’s topic is the rain, they might diverge on which words are most important to ‘rain’ to convey this sentence’s true meaning.

Benny might say ‘tapped’ and ‘window’ were the most important, because the sentence is about the noise the rain makes.

Frida might say ‘softly’ and ‘summer’ are the most important, because this is a sentence about what summer rain is like.

Bjorn might take a different approach altogether, and focus on ‘ed’ and ‘that’ to suggest this sentence is about the memory of past rain.

While all three readers are taking in the same sentence, each is paying attention to different parts. Each is attributing different weighing to some words in relation to ‘rain’, while discounting the importance of others.

Their different ways of understanding this sentence also set their expectations of what comes next. When asked to continue the following sentence starting with: “It …”,

## Hire Machine Learning Developers in India

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

## Applications of machine learning in different industry domains

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

## Hire Machine Learning Developer | Hire ML Experts in India

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

## What is Supervised Machine Learning

What is neuron analysis of a machine? Learn machine learning by designing Robotics algorithm. Click here for best machine learning course models with AI

## Pros and Cons of Machine Learning Language

AI, Machine learning, as its title defines, is involved as a process to make the machine operate a task automatically to know more join CETPA