Enter GPT-3: it's like GPT-2, but much bigger

June 16th
GPT-2 is green, GPT-3 is blue. (OpenAI)

GPT-2, OpenAI's natural language processing model, created convincingly real deepfake text and spurred a flurry of hobbiests who used to to write poetry, music, make up beer names and even play chess, when it was released last year. Now it's getting a significant upgrade.

"Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting," OpenAI wrote.

GPT-3, like its little brother GPT-2, creates genuinely convincing text from simple prompts. See samples here -- there are full descriptions of alternate event histories that, while they sound credible, aren't based on anything but good grammar and word choice. To call them fake news would be wrong. They're exercises in mimicry.

But, unlike its predecessor, GPT-3 can do arithmetic. GPT-2, at 1.3 billion parameters, could not solve basic math problems. According to Slate Star Codex, the model's math skills begin improving drastically above 13 billion. And then, at GPT-3's 175 billion parameters, "it gets an A+."

It may also be very, very expensive.

From Gwern, a lodestar in the GPT-2 community:

GPT-3 is an extraordinarily expensive model by the standards of machine learning: it is estimated that training it may require the annual cost of more machine learning researchers than you can count on one hand (~$5m), up to $30 of hard drive space to store the model (500–800GB), and multiple pennies of electricity per 100 pages of output (0.4 kWH). Researchers are concerned about the prospects for scaling: can ML afford to run projects which cost more than 0.1 milli-Manhattan-Projects? Would it be worthwhile, even if it represented another large leap in AI capabilities, to spend up to 10 milli-Manhattan-Projects to scale GPT-3 100✕ to achieve human-like performance in some domains? Many researchers feel that such a suggestion is absurd and refutes the entire idea of scaling machine learning research further, and that the field would be more productive if it instead focused on research which can be conducted by an impoverished goatherder on an old laptop running off solar panels. Nonetheless, I think we can expect further scaling.

That's really what GPT-3 is, a massive scaling of its little brother's successes. It's worth noting that, early last year when OpenAI said it wouldn't release the full GPT-2 model becuase it is "too good," it ultimately did release it by arguing that the full model wasn't significantly better than the half-sized version it previously published. They seem to be taking a bet that 100x is better than 2x. But this time they're going to release it without any of their dramatic fanfare.

And if they're on to something, they might really be on to it: