How BERT Masked-Language Modeling (MLM) Works

https://youtu.be/q9NS5WpfkrU

BERT, everyone’s favorite transformer costs Google ~$7K to train (and who knows how much in R&D costs). From there, we write a couple of lines of code to use the same model - all for free.

BERT has enjoyed unparalleled success in NLP thanks to two unique training approaches, masked-language modeling (MLM), and next sentence prediction (NSP).

MLM consists of giving BERT a sentence and optimizing the weights inside BERT to output the same sentence on the other side.

So we input a sentence and ask that BERT outputs the same sentence.

However, before we actually give BERT that input sentence - we mask a few tokens.

So we’re actually inputting an incomplete sentence and asking BERT to complete it for us.

#transformers #python #nlp #machine-learning #deep-learning #data-science