How OpenAI prices inputs and outputs for GPT-4
See the update: Pricing thought: OpenAI will price reasoning tokens in o1
Many companies are looking at how to use Open.ai’s GPT-4 to inject AI into existing solutions. Others are looking to build new applications or even whole new software categories on the back of Large Language Models (LLMs). To make this work we are going to need to have a good understanding of how access and use of these models will be priced. Let’s start with the current pricing for GPT-4. It is possible that this pricing will shape that of other LLM vendors, at least those that provide their models to many other companies. Or pricing may diverge and drive differentiation. It will be interesting to see and will cannalize the overall direction of innovation.
As I write, in April 2023, GPT-4 is available in two packages and has two pricing metrics.
The pricing page is shown below. In addition to GPT-4, Open.ai publishes pricing for Chat, InstructGPT, model tuning, Image models and Audio models. Many real world applications will combine more than one model. The interactions between the different pricing models will be important, but let’s start with the basics.
Before diving into the pricing, one needs to understand the key input metric, tokens. Tokens show up all over the place in LLMs and ‘tokenization’ is the first step in building and using these models. Here is a good introduction.
A token is a part of a word. Short and simple words are often one token, longer and more complex words can be two or three tokens.
It can get more complex, but that is the basic idea. Open.ai bases it price on tokens. This makes sense at this point. It allows them to price consistently across some very different applications. Other companies could learn from this. One place to start to think about pricing metrics is at the atomic level of the application, whether this is a token, an event, a variable or an object and its instantiations.
Open.ai has two packages. They are based on ‘context,’ 8K of context and 32K of context. Generally more sophisticated applications solving harder problems will need more context.
The pricing metrics are the Input Tokens (in the prompts) and the Output Tokens (the content or answer generated).
For the 8K and 32K contexts, the price is as follows:
Input: $0.03 (8K) or $0.06 (32K) per thousand tokens
Output: $0.06 (8K) or $0.12 (32K) per thousand tokens
Note that outputs are priced twice as high as inputs. This may encourage more and larger inputs (prompts), and more elaborate prompts will often provide better outputs.
Things to think about. This may be an effective pricing model. One would need to build a value model, a cost model and process a lot of data to really know, but it seems like a good place to start. I assume OPen.ai is doing a lot of this analysis over the next few months and I expect the pricing to change. They are likely using AI to support this analysis and I hope they share their approach.
One reason this model works is because it uses two pricing metrics. Using two pricing metrics, also known as hybrid pricing, is a key to having flexible pricing that will work at different scales and in different scenarios. Most SaaS companies should be considering some form of hybrid pricing.