Estimating GPT3 API Cost

Do you know how much GPT3 API will it cost?

A rough calculation tells me it can go a maximum of 790 requests/$.

GPT3 is pretty huge**(175B parameters = 700GB)** and you know how costly GPU inferences can be. Even if we find a use case for it, we still need to justify the ROI. There are many blogs on the potential applications but I haven’t found anything on its pricing.

Let’s try to guess it with the fundamentals of cloud pricing.

Note: You can use this methodology for calculating the API cost for any model. People also like to use AWS TCO(Total cost of ownership) calculator but I enjoy doing it manually.

Image for post

STEP 0 — Usecase

Transformers are quadratic in compute. So it’s extremely crucial to decide on the use case for it because the use case will decide the sequence length.

The best use case for GPT3 is text generation given the prompt.

The prompt can be of any length but 128 makes a sensible guess. People also do it recursively by appending the previously generated text to generate more.

GPT3 can take the seq_length up to 1024(max supported) but due to the quadratic nature of the transformer, it is going to make the inference even costlier.

Let’s fix the seq length to 128 and then use scaling to calculate for 1024.

Note: You can use this methodology for calculating the API cost for any model. People also like to use AWS TCO(Total cost of ownership) calculator but I enjoy doing it manually.

STEP 1 — Getting GPT2 inferences per hour

Assumptions

Seq length — 128
GPU + XLA inference on Tensorflow
V100 GPU instance
12 vCPUs, 40GB of RAM
Batch size — 8

From HuggingFace experiment sheet, GPT2 gets inference time of 0.02s for a batch size of 8 on Tensorflow GPU + XLA.

Hence it can serve 8*3600/0.02 = 1440000 inferences/hour.

STEP 2 — Getting GPT3 inferences per hour

GPT2–1.5B parameters

GPT3–175B parameters

Since GPT3 cannot fit on 1 GPU, its split across many. For simplicity reasons, let’s assume we can extrapolate the inference time with linear calculation. Although multi-GPU can be slower due to the passing of gradients from 1 GPU to another.

Equivalent GPT3 inferences/hour/GPU

= 1440000*1.5/175

= ~12400

#naturallanguageprocessing #data-science #gpt-3 #deep-learning #machine-learning

STEP 0 — Usecase

STEP 1 — Getting GPT2 inferences per hour

STEP 2 — Getting GPT3 inferences per hour

towardsdatascience.com

Estimating GPT3 API Cost