The ability to summarize evaluates one’s understanding of a given piece of text or a language.

Perhaps the best test of a man’s intelligence is his capacity for making a summary

— Lytton Strachey

Hence, summarization is a fairly significant concept in NLP. I have already covered summarization as a whole and abstractive summarization along with its implementation using Transformers in this post. Consider giving it a read if you are interested in getting a brief background on this task; the PEGASUS model is trained on the Transformer architecture.

In this article, we will discuss a recent paper, “PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization” proposed by Google AI which is supposed to appear in ICML 2020.

PEGASUS: Pre-training with Extracted Gap-Sentences for Abstractive Summarization

Like any other sequence transduction task, PEGASUS, too, implements the seq2seq architecture. However, the novelty of this architecture lies in its self-supervised pre-training objective.

Self-Supervised Learning is the new cool in Deep Learning. It essentially removes the dependency of data on labeled samples and makes a huge amount of unexplored, unlabelled data accessible for training.

Combination of Transformer-based models with self-supervised pre-training (e.g., BERTGPT-2RoBERTaXLNetALBERTT5ELECTRA), has proven to be immensely effective in overall Language Modeling tasks.

Gap Sentences Generation (GSG): Self-Supervised Objective for Summarization

Image for post

Self-Supervised Pre-training in PEGASUS

The main idea behind this objective is the assumption that, closer the pre-training self-supervised objective is to the final downstream task, the better the fine-tuning performance

Thus, in PEGASUS, complete sentences are removed from a document (i.e. they are ‘masked’), and the model is trained to predict these sentences as shown in the figure. The authors admit that this task seems nearly impossible, even for humans for a matter of fact. But such training elicits a higher sense of understanding for the generation of sentences that have an instance of the original document; thus supporting their assumption. This task is coined as Gap Sentence Generation (GSG).

As an addition to this, the authors assert that choosing the most important sentences from the document for masking works best. This is done by finding sentences that are the most similar to the complete document according to a metric called ROUGE (which is usually used to evaluate the quality of a summary in summarization tasks).

#machine-learning #towards-data-science #artificial-intelligence #deep-learning #nlp #deep learning

PEGASUS: Google’s State of  the Art Abstractive Summarization Model
10.85 GEEK