Ibbaka

View Original

From tokens to tasks - how will OpenAI price o3?

Steven Forth is CEO of Ibbaka. Connect on LinkedIn

Most of us are just getting up to speed on OpenAI o1. It is now available through Perplexity and You.com and for those with a real need there is the OpenAI Pro Subscription at US$200 per month.

Who will pay $200 per month for unfettered access to a reasoning model?

Quite a few people I think, as this may be one of the cheaper options for accessing the power of o1.

If you are just using AI as a fancy way to do search you may not need o1. The same is true if you have already engineered a compelling prompt orchestration system that reaches out to different types of model for different steps (many systems use Wolfram Alpha as their computational intelligence). That still leaves a lot of people who need to work on complex scientific, financial or business problems and who can use the added power of a reasoning model. For these people having a reasoning model as an aide to thought (something much more than a co-pilot). The concern will be costs.

At present the best way to access o1 (o3 is not yet available) is through an OpenAI Professional Subscription at US$200 per month. It will be interesting to see how this price is accepted by the market and if it sets a new anchor for high end subscriptions.

The other way to pay for o1 is per token.

o1 Models

  • o1-preview: $15 per million input tokens and $60 per million output tokens

  • o1-mini: $3 per million input tokens and $12 per million output tokens

GPT-4o Models

  • GPT-4o: $2.5 per million input tokens and $10 per million output tokens

  • GPT-4o mini: $0.15 per million input tokens and $0.60 per million output tokens

That is not all there is to it though. OpenAI also charges for reasoning tokens and charges them at the higher price it charges for output tokens. Reasoning tokens are generated by o1 itself as part of its reasoning process. How many reasoning tokens are generated? That is a bit opaque and is higly dependent on the task. Some tests by Artificial Analysis are shown below.

 “Comparison of Models: Quality, Performance & Price Analysis” by Artificial Analysis

The number of reasoning tokens exceeds the number of output tokens and the more powerful the model the greater the difference!

o3 is much more powerful than o1, some people at OpenAI are calling it a superintelligence, and it outperforms humans on some sophisticated coding and math tasks. How will o3 be priced? This is important as any agent that is carrying out a complex task may benefit from leveraging a reasoning model. Just as there is competion between model vendors as to who can provide the best models application vendors and vertical AI companies will compete to see who can make the best use of them. Best use means who can create the most value for their customers. In the end it all comes back to value.

Is o3 an opportunity for OpenAI to move from tokens to tasks?

From Arcprize

Take a close look at the above chart. The price for o3 High is, well, high, at much more than $1,000 for the task. The x-axis is on a log scale so we are talking about $3,000. Using a rule of thumb that the value capture ratio needs to be around 5% for innovations that means that this ‘task’ needs to be valued at about $60,000. That is a high hurdle.

We did a quick poll on January 6, 2025 to gauge initial pricing response to reasoning models. The poll collected responses from Steven Forth, the Professional Pricing Society LinkedIn Group, the Design Thinking Group, the Artificial Intelligence Exchange and the Software as a Service - SaaS Group. N = 55.

Ibabka LinkedIn Poll from January 6, 2025 on price acceptance of a reasoning co-pilot.

Companies that want to take reasoning models or applications built on reasoning models to market (that would include Ibbaka) have a lot of work to do on value communication and documentation before worrying about pricing.

People, buyers that is, may struggle to bridge from tokens, especially ‘reasoning tokens’ to value and this could be a barrier to the adoption of o1, o3 and other reasoning models. (For an interesting alternative to OpenAI’s approach see Self-Discover: Large Language Models Self Compose Reasoning Structures by Pei Zhou and other researchers at Google DeepMind.) Is there a better framing that could be used to bridge from tokens to value?

Two areas worth exploring are tasks and problems, these map to two different use cases for reasoning models like o1 and o3.

One designspace for AI applications (proposed by Ibbaka)

Pricing Task Completion for Reasoning Models

One use case for reasoning AIs is to have them complete complex tasks or provide assistance in completing these tasks. This is a kind of Co-Pilot style packaging. OpenAI or other vendors building on these models can create high value assistants to help with recurring, complex tasks. If depth of reasoning becomes one of the dimensions of the AI design space there will be a niche for this type of application. To win the pricing levels that vendors are likely to want the pricing metric will need to be task completion and not just access.

Pricing Problem Solution for Reasoning Models

A different way of packaging reasoning models will be as problem solvers. In this case users will spend more time building up context (How will RAG or Retrieval Augmented Generation work with reasoning models?) and exploring different approaches to solving complex problems. The ideal pricing metric here will be outcome based, but outcome based pricing only works when three conditions are met:

  1. The definition of the outcome is clear (‘Has the problem been solved?’)

  2. Attribution can be agreed on (‘What contribution did the reasoning model make?’)

  3. The cost can be predicted (This is why pricing using reasoning tokens will be problematic for many buyers)

Anyone whose business includes solving complex problems needs to be exploring the use of reasoning models, whether these are OpenAI o1 and o3 or other approaches like Self-Discover. But we have a long way to go to understand the best way to package and price these applications. First we need to create, deliver and document value. That requires a value model.