The ABBA explainer to BERT and GPT-2

The ABBA explainer to BERT and GPT-2

For the life of me, I couldn’t understand how BERT or GPT-2 worked. I read articles; followed diagrams; squinted at equations; watched recorded classes; read code documentation; and still struggled to make sense of it all. It wasn’t the math that made it hard.

For the life of me, I couldn’t understand how BERT or GPT-2 worked.

I read articles; followed diagrams; squinted at equations; watched recorded classes; read code documentation; and still struggled to make sense of it all.

It wasn’t the math that made it hard.

More like, that the big part you’d expect to precede the nitty-gritty was somehow missing.

This article bridges the gap, explaining in simple terms how these models are built. It’s the article I wish I could have read first; many of the details would have then slotted right into place.

With the generous help of ABBA, we’ll introduce three different ideas. I guarantee there won’t be a single mathematical equation in any of them:

1. Thinking about words and meaning — Attention

2. Pancake stacking small components — Deep Learning

3. Thinking about words and signals — Embeddings

In the fourth and final section, we’ll see how these ideas tie neatly into a bow.

1. Thinking about words and meaning — Attention

Let’s look at the following sentence:

Image for post

Suppose we asked three friends to read this sentence. While they’d probably agree the sentence’s topic is the rain, they might diverge on which words are most important to ‘rain’ to convey this sentence’s true meaning.

Benny might say ‘tapped’ and ‘window’ were the most important, because the sentence is about the noise the rain makes.

Frida might say ‘softly’ and ‘summer’ are the most important, because this is a sentence about what summer rain is like.

Bjorn might take a different approach altogether, and focus on ‘ed’ and ‘that’ to suggest this sentence is about the memory of past rain.

While all three readers are taking in the same sentence, each is paying attention to different parts. Each is attributing different weighing to some words in relation to ‘rain’, while discounting the importance of others.

Their different ways of understanding this sentence also set their expectations of what comes next. When asked to continue the following sentence starting with: “It …”,

bert nlp gpt-2 editors-pick machine-learning

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Hire Machine Learning Developers in India

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

Applications of machine learning in different industry domains

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

Hire Machine Learning Developer | Hire ML Experts in India

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

What is Supervised Machine Learning

What is neuron analysis of a machine? Learn machine learning by designing Robotics algorithm. Click here for best machine learning course models with AI

Pros and Cons of Machine Learning Language

AI, Machine learning, as its title defines, is involved as a process to make the machine operate a task automatically to know more join CETPA