AI Training Method Exceeds GPT-3 Performance with 99.9% Fewer Parameters

A team of scientists at LMU Munich have developed Pattern-Exploiting Training (PET), a deep-learning training technique for natural language processing (NLP) models. Using PET, the team trained a Transformer NLP model with 223M parameters that out-performed the 175B-parameter GPT-3 by over 3 percentage points on the SuperGLUE benchmark.

PhD student Timo Schick and professor Hinrich Schütze of the university’s Center for Information and Language Processing described their process and experimental results in a paper published on arXiv. PET is a technique for fine-tuning a pre-trained language model that generates additional “soft-labeled” training data from unlabeled examples. This helps the model improve performance in “few-shot” scenarios, such as NLP benchmarks which have very few labeled examples for fine-tuning. Using PET, the researchers fine-tuned an ALBERT Transformer model and achieved an average score of 76.8 on the SuperGLUE benchmark, compared to GPT-3’s 71.8.

Supervised machine learning often requires large datasets to perform well on tasks such as computer vision or NLP. However, labeling these large datasets can be time-consuming and expensive, as it requires human workers to manually identify objects in images or rate a sentence’s sentiment. For NLP tasks, many researchers have turned to transfer learning, where a large model is pre-trained via self-supervised learning on a large unlabeled dataset, such as the contents of Wikipedia. Once a model is pre-trained, it can be “fine-tuned” for a specific task, such as sentiment analysis, using supervised learning on a much smaller labeled dataset. Most state-of-the-art NLP results are achieved by fine-tuning a pre-trained Transformer model.

#deep learning #neural networks #natural language processing #ai # ml & data engineering #news

infoq.com

AI Training Method Exceeds GPT-3 Performance with 99.9% Fewer Parameters