LLM Pricing Hangover

For the past few years, major LLM providers (OpenAI, Google, Anthropic, etc.) have offered their models either for free or at relatively low prices. This has been enabled by the hundreds of billions of dollars poured into the scene by VCs and other investors.

It's a familiar playbook for disruptive businesses: remember how much cheaper Uber or DoorDash were some years ago? Heavy subsidization lures in users, and the bet is that once users become dependent, they'll tolerate higher prices later.

A similar dynamic may play our with LLMs too, at least in the mid-to-long-term. So far, we've been in an aggressive scaling phase, where providers have prioritized adoption, experimentation, and ecosystem growth over cost discipline. This has allowed both consumers and startups to explore how LLMs fit into everyday workflows, at the expense of the providers.

It's already clear that LLMs are here to stay. In software, AI-assisted engineers are already at least 2x more productive than someone from five years ago, and the tools will only continue to improve. I imagine adoption in other industries is less mature, but competitive pressure will likely accelerate it.

As usage grows, it's time for LLM providers to focus more on the economics: how to make the business model sustainable. Running LLMs, i.e., AI inference, at scale requires top-tier microchips (provided by Nvidia, AMD, and the likes) and significant energy — so essentially data centers.

I recently came across an interesting article by The Verge: You're about to feel the AI money squeeze. It quotes a senior analyst at Gartner, Will Sommer, who estimates that AI data center investments between 2024 and 2029 might reach $6.3 trillion. He argues that investors expect roughly a 25% ROIC, as that's what top tech companies (Microsoft, Apple, Google, etc.) achieve. For institutional investors, the threshold is at around 12% — for anything lower, there's better money made elsewhere. If the return is below 7%, it's write-down territory.

To put that into perspective: to reach even a 7% bare minimum threshold for ROIC, LLM companies will need to generate hundreds of billions of dollars in annual returns. Sommer estimates at least about $2 trillion in sales each year until 2029.

A quick refresher on how LLM providers generate revenue: AI usage is measured in tokens, where one token corresponds to roughly four English characters. A typical English paragraph might use about 400 tokens, while a 1,500-word essay might be about 2,000 tokens¹. Because usage varies widely (some users prompt a chatbot a few times a day, while power users are running multiple agents around the clock), LLM providers have moved to selling access to tokens rather than fixed-subscriptions: if you use more tokens, you pay extra. Profit per token is essentially the price minus the cost of compute and energy.

Sommer estimates that LLM providers are already processing roughly 100-200 quadrillion² tokens per year. However, here's the kicker: Sommer estimates that to reach target returns, they would need to process on the order of 10 sextillion³ tokens aannually — a 50,000—100,000x increase, assuming a "generous" 10% profit margin per token.

(This assumes current technology; more on that below. The key point is that either volume, efficiency, or both must improve 50,000—100,000-fold).

And that's just for inference economics. There's also the cost of ongoing R&D: training newer, more intelligent models, which is extremely expensive.

So how can LLM providers solve this equation? The most straightforward answer is a price squeeze: tokens will become more expensive for end users. I think this has been looming on the horizon since the beginning, and we're about to face it soon (it's already begun with Anthropic's stricter rate limits).

First, Sommer argues, free tiers will likely disappear or become much more limited (or possibly monetized through advertisement, as OpenAI has begun doing).

For enterprise users, prices will almost certainly rise. While companies can switch some workloads, like chatbot capabilities, to use open source models (e.g., DeepSeek), these models still lag behind the top-tier proprietary systems in areas like coding (think Anthropic's Opus 4.7 or OpenAI's ChatGPT 5.5).

If we think about this in pure energy consumption, JLL estimates data centers used about 97 GW of energy in total in 2025, of which 9% was used on AI inference (=8.73 GW). By 2030, the total energy consumption is estimated to double to about 200 GW, of which AI inference would use 37% (=74 GW). So by the end of the decade, AI inference would use roughly 8.5x more resources.

JLL estimate of data center capacity growth by 2030.

Even with that increase, power alone won't close the 50,000—100,000x gap. Hardware improvements help: Nvidia estimates its latest chips can improve throughput by 50x per MW.

Combining increased capacity and better hardware yields roughly ~500x improvement in throughput. That still leaves a 100—200x gap.

It gets more wishy-washy here, but maybe we can squeeze more juice out of software improvements: more efficient models (mixture-of-experts vs. dense architectures), better caching, and other optimizations can significantly reduce compute requirements. research from Epoch AI suggests "researchers have made the underlying algorithms far more efficient — each year, the same performance can be achieved with 3x less compute". If that trend continues, we'll get a 3⁴=81x improvement from software optimization.

That gets us close to our target for inference economics, though there are still the R&D expenses to cover.

Therefore, it's wise to prepare for a pricing hangover in the form of higher costs. That's the standard playbook for Silicon Valley disruptors.

That said, a part of why the AI field is so exciting to follow is that the scene changes so fast, and these assumptions might age like milk; breakthroughs in hardware, software, or business models could shift the trajectory entirely. It'll be interesting to see how all this plays out, especially while major players continue competing aggressively for market share before tightening pricing.

Shout out to Inderes forum for giving me the inspiration for this blog post

Footnotes

Per OpenAI estimate: https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them ↩
A quadrillion is 10 in the power of 15, i.e., a million billions. ↩
A sextillion is 10 in the power of 21, i.e., a million quadrillions, so a million million. ↩