GPT-3, a giant step for Deep Learning and NLP? - KDnuggets

GPT-3, a giant step for Deep Learning and NLP? - KDnuggets

Recently, OpenAI announced a new successor to their language model, GPT-3, that is now the largest model trained so far with 175 billion parameters.

Recently, OpenAI announced a new successor to their language model, GPT-3, that is now the largest model trained so far with 175 billion parameters. Training a language model this large has its merits and limitations, so this article covers some of its most interesting and important aspects.

What’s in a Language Model?

"The diversity of tasks the model is able to perform in a zero-shot setting suggests that high-capacity models trained to maximize the likelihood of a sufficiently varied text corpus begin to learn how to perform a surprising amount of tasks without the need for explicit supervision."

This is an excerpt from the paper accompanying GPT-2GPT-3is taking another step in this avenue.

More specifically, the authors pinpoint the drawbacks of fine-tuning using task-specific datasets.

  • Getting these datasets is difficult.
  • Fine-tuning allows the model to exploit spurious correlations, which lead to bad out-of-distribution performance.
  • A brief directive in natural language is usually enough for humans to understand a given task. This adaptability is a desired property of NLP systems.

The route the authors chose to take is "in-context learning” - feeding the model a task specification (prompt) or a few demonstrations of the task as a prefix, priming it towards a subspace in the latent space that adheres to the given task. Translation, for instance, would look like "Q: What is the {language} translation of {sentence} A: {translation}”.

This is based on the assumption that the model develops a broad set of skills and pattern recognition abilities at training time, and then uses those abilities at inference time to rapidly adapt to or recognize the desired task.

It’s common wisdom that low perplexity is correlated with performance on downstream tasks, so one can hope that bigger models will yield better in-context capabilities. And indeed, this holds true, as can be seen in the next figure, where a simple task requiring the model to remove random symbols from a word is tested:

The number of in-context examples varies between 10 to 100 because this is typically what’s permitted with the model’s context size of 2048. Prompt (task specification) plays a significant role when the number of examples is low.

The authors tested many well-known benchmarks, but first - let’s inspect the model specification.

Heavy Weight Lifting

GPT-3 is made up of a Transformers-based architecture similar to GPT-2, including the modified initialization, pre-normalization, and reversible tokenization described therein, with the exception that it uses alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer.

The authors trained several model sizes, varying from 125 million parameters to 175 billion parameters, in order to measure the correlation between model size and benchmark performance.

2020 jun opinions ai deep learning gpt-2 gpt-3 nlp openai

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Top 10 Deep Learning Sessions To Look Forward To At DVDC 2020

Looking to attend an AI event or two this year? Below ... Here are the top 22 machine learning conferences in 2020: ... Start Date: June 10th, 2020 ... Join more than 400 other data-heads in 2020 and propel your career forward. ... They feature 30+ data science sessions crafted to bring specialists in different ...

OpenAI’s Not So Open GPT-3 Can Impact Its Efficacy

OpenAI’s recent breakthrough of pre-trained language model GPT-3 has revolutionised the concept of machines writing codes like humans.

Top Free Resources To Learn GPT-3 - Analytics India Magazine

Free resources that can help developers understand GPT-3 most effectively and can help them get their hands-on this pioneer ML model.

GPT-3, a Giant Step for Deep Learning And NLP

Can intelligence emerge simply by training a big enough language model using lots of data? OpenAI tries to do so, using 175 billion parameters.

Can GPT-3 Build a GPT-3 App?

Three months since OpenAI has released their GPT-3 API, it’s now a shared notion — getting to STOA results is mostly a function of effective prompts programming, rather than an NLP task.