BERT, everyone’s favorite transformer costs Google ~$7K to train (and who knows how much in R&D costs). From there, we write a couple of lines of code to use the same model - all for free.
BERT has enjoyed unparalleled success in NLP thanks to two unique training approaches, masked-language modeling (MLM), and next sentence prediction (NSP).
MLM consists of giving BERT a sentence and optimizing the weights inside BERT to output the same sentence on the other side.
So we input a sentence and ask that BERT outputs the same sentence.
However, before we actually give BERT that input sentence - we mask a few tokens.
So we’re actually inputting an incomplete sentence and asking BERT to complete it for us.
#transformers #python #nlp #machine-learning #deep-learning #data-science