The Problem With GPT-3 Reporting

I’ve recently seen a massive number of articles about GPT-3, on Medium and elsewhere. I even wrote one. The language model is a significant development in AI, so it’s only natural that writers want to share their excitement with the world.

Here’s the problem: the ability of GPT-3 — namely the quality of its writing — is often exaggerated by published samples. In fact, there are not one, but two filters keeping the AI’s worst results from wide dissemination.

Selection bias wouldn’t be a problem if any interested reader could access the GPT-3 API and make their own observations of its ability. However, access is currently severely limited. (AI Dungeon is often used to test GPT-3 by those of us without the full version, but its creator has recently outlined ways backdoor access to GPT-3 is being prevented.)

When reporting — and I use that term in its broadest possible interpretation to mean any writing about GPT-3 — is the only source of public information, selection biases ought to be considered in our understanding of the product. Here, I outline the obvious bias, and a less-obvious bias which exacerbates the issue.

1. Writing samples are selected for quality

Say I’m writing an informative piece on GPT-3. I want to demonstrate that it can put together coherent strings of sentences, so I give it a prompt and examine the output.

If I don’t like what I see, I’m likely to try again with a slightly different (perhaps longer) prompt. Even if I’m not actively selecting particular sentences that suit the purpose of my article, massaging the output creates a biased sample of writing that is not representative of GPT-3’s overall quality.

In the context of creating a narrative about the AI, it makes sense to showcase its best work rather than a fair representation of its limitations. This is the first problem.

#reporting #artificial-intelligence #gpt-3 #statistics #mathematics

1. Writing samples are selected for quality

towardsdatascience.com

The Problem With GPT-3 Reporting