An Introduction to Retrieval-Augmented Language Model Pre-Training

Image for post

Photo by Edward Ma on Unsplash

Since 2018, the transformer-based language model has been proven to achieve good performance in lots of NLP downstream tasks such as Open-domain Question Answer (Open-QA). To achieve better results, models intend to increase model parameters (e.g. more heads, larger dimensions) in order to stored world knowledge in the neural network.

Guu et al. (2020) from Google Research released the state-of-the-art model (Retrieval-Augmented Language Model Pre-Training, aks REALM) which leverages knowledge retriever augmented data from other large corpora such as Wikipedia. Given an extra signal, it helped the model to deliver a better result. In this storied, we will go through how does this model achieves the start-of-the-art result.

REALM Overview

The overall idea is leveraging extra document to provide more signal to the model such that it can predict masked token accurately. The name this approach as a retrieve-then-predict approach. The following diagram shows pre-trianing workflow.

  1. Given a masked sentence (The [MASK] at the top of the pyramid)
  2. Feeding a masked sentence to Neural Knowledge Retriever. It will return a document (not necessarily a whole article) that relates to the input.
  3. Passing both the original sentence and augmented document to Knowledge-Augmented Encoder. It will predict the masked token (pyramidion).

Image for post

For the fine-tuning stage, it used unmasked sentence instead of a sentence which contains a masked token.

#artificial-intelligence #questions-answers #machine-learning #nlp #data-science #deep learning

REALM: Retrieval-Augmented Language Model Pre-Training
2.50 GEEK