Implementing Mobile BERT for Next Sentence Prediction

Given two sentences, isn’t it interesting to predict how likely the second is to follow the first? In this article, we are going to discuss this type of prediction, especially if the prediction has to happen on a mobile device. For example, you are writing a poem and you’d like to work on your favorite mobile app providing this next sentence prediction feature, you can allow the app to suggest the following sentences.

The problem of prediction using machine learning comes under the realm of natural language processing. There are many algorithms in the area of natural language processing to implement this prediction, but here we are going to use an algorithm called BERT.

BERT stands for _Bidirectional Encoder Representations from Transformers. As the name suggests, _BERT is a bidirectional model architecture. It means the network learns from both the right and left side of a word in a sentence. BERT is based on Transformer which is a neural network architecture based on self-attention mechanisms.

MobileBERT is a variant of BERT that fits on mobile devices. It is a compressed and accelerated version of the popular BERT model. It can be applied to various downstream NLP tasks via some fine-tuning. It is important to know that if we need to train a MobileBERT model, we first need to train a teacher model and then perform a knowledge transfer.

MobileBERT is similar to BERT and thus relies on masked language modeling. It is therefore efficient at predicting masked language tokens and natural language understanding, but may not optimal for text generation. This is because sentence generating is directly related to language modeling and because of the bi-directionality of BERT, BERT cannot be used as a language model.

In this article, we’re going to discuss one of the MobileBERT implementations, called MobileBertForNextSentencePrediction.

MobileBertForNextSentencePredictionis aMobileBERT model with a next sentence prediction head on top. In other words, it’s a linear layer on top of the pooled output and a softmax layer. It’s a PyTorch torch.nn.Module sub-class and a fine-tuned model that includes a BERTModel and a linear layer on top of that BERTModel, used for prediction.

#machine-learning #nlp #bert #python-programming #heartbeat

heartbeat.fritz.ai

Implementing Mobile BERT for Next Sentence Prediction