The era of “unlimited” AI access is ending. In late March, heavy users of Anthropic’s Claude began reporting that they were exhausting their monthly usage caps in as little as 20 minutes. Complaints flooded social media platforms like Reddit and X, with users finding their sessions abruptly terminated during peak hours.
This is not an isolated incident. Anthropic responded by blocking certain third-party tools from accessing flat-rate subscriptions and adjusting default settings to reduce the model’s “thinking” depth. Simultaneously, OpenAI has begun restricting access to its video-generation platform, Sora, even as its coding assistant, Codex, saw a surge to four million weekly developers.
These restrictions signal a broader industry shift: demand for artificial intelligence is outpacing the physical infrastructure required to support it. This phenomenon, increasingly referred to as the “compute crunch,” reveals that the digital promise of infinite scalability has collided with the hard realities of physics, manufacturing, and energy.
The End of the Flat-Rate Model
For decades, the internet operated on a flat-rate subscription model. Whether you sent one email or a thousand, your monthly fee remained the same because the marginal cost of serving additional users was negligible. AI breaks this economic model.
According to Lennart Heim, an AI policy expert and former lead of compute research at the RAND Center, AI inference (the process of running a model to answer a user’s query) is incredibly resource-intensive. Unlike a simple webpage load, every token generated by an AI requires significant processing power.
“If 10 times more people use AI 10 times more heavily, you need close to 100 times more compute.”
When a user pays a flat monthly fee but consumes resources worth significantly more, the provider loses money. Consequently, companies are forced to implement rate limits or switch users to smaller, less powerful models by default. This explains why features like OpenAI’s “Auto” mode or Anthropic’s default to the lighter Claude Sonnet model have become standard—they are cost-control mechanisms disguised as user experience features.
The Physical Bottleneck
The core issue is that software scalability no longer decouples from physical constraints. Historically, Silicon Valley could scale services exponentially without building new factories. Today, scaling AI requires:
- Advanced Chips: Taiwan Semiconductor Manufacturing Company (TSMC), which produces most of the world’s advanced AI chips, announced plans to spend up to $56 billion this year on capacity expansion. Yet, demand remains insatiable.
- Specialized Memory: AI chips require vast amounts of high-bandwidth memory. As this memory is diverted to data centers, prices rise globally, potentially increasing the cost of consumer electronics like smartphones.
- Energy Infrastructure: The International Energy Agency projects that global data center electricity use will double by 2030. Anthropic estimates that the U.S. AI sector alone will need 50 gigawatts of electric capacity by 2028—equivalent to the output of 50 large nuclear reactors.
Manufacturers of critical hardware, such as gas turbines for power generation, are ill-prepared for this sudden demand. An industry that has seen flat growth for a decade cannot simply scale up production overnight to meet the needs of tech giants.
The R&D vs. Serving Trade-off
A secondary tension exists within AI companies: the competition between training (building new, smarter models) and inference (serving current users).
Recent reports suggest that up to 60% of compute resources are currently dedicated to Research and Development (R&D). Companies are trapped in a dilemma: they must allocate resources to maintain their competitive edge in creating state-of-the-art models while simultaneously trying to monetize existing technology through user services. As Lennart Heim notes, this is not just a technical issue but a strategic allocation of scarce physical assets.
What This Means for the Future
The compute crunch is more than a temporary inconvenience for developers; it represents a structural shift in the AI economy. As AI becomes integral to coding, medicine, and business operations, access to compute becomes synonymous with access to economic speed and capability.
While companies like OpenAI currently have the financial cushion to absorb some of this strain, the long-term solution will likely involve higher prices rather than unlimited access. The market is correcting: the illusion of infinite, cheap intelligence is fading, replaced by a reality where compute is a finite, expensive commodity.
In short, the AI boom has outgrown its physical foundation. Until supply chains for chips, memory, and energy can catch up with digital demand, users should expect stricter limits, higher costs, and a more measured approach to AI consumption.
