The token bill comes due: Inside the industry scramble to manage AI’s runaway costs

0 5 minutes read

Across the industry, companies are starting to hesitate about the price of AI. Uber exhausted its entire 2026 AI coding budget in April. Microsoft has withdrawn developers’ Claude Code licenses months after being enabled. A Priceline employee told TechCrunch that a routine Cursor contract renewal ended up being 4-5x more expensive.

Although prices per token have fallen, the push for more AI adoption and increasingly autonomous agents has driven token consumption higher and higher. Companies that gorged themselves on unlimited plans in early 2025 are now trying to understand where their money is going, pull back on spending, and figure out if they can salvage some ROI from the mess that is their budgets.

In the meantime, a market is forming to meet them there. Startups, established vendors and a new standards body are all in the race to give companies the tools and language to track what they spend.

“Six months ago I had a conversation with a customer and it was all about ‘What can it do? Is it good enough?'” Alexander Embricos, OpenAI’s head of enterprise, told TechCrunch this week at an event in New York City. “Our conversations now are never about that. Now the conversations are about, ‘Hey, we’re spending this much. What visibility do you have? What auditability do you have? What token controls do you have? What is the efficiency of your models?'”

It is against this backdrop that the Linux Foundation this week unveiled plans for the Tokenomics Foundation, a new standards organization that aims to implement the same cost discipline around AI tokens that FinOps did for cloud spending.

“In April and May, I started hearing from companies, ‘Oh my god, we’re three times over our entire 2026 token budget and it’s only April,’” JR Storment, executive director of the FinOps Foundation, a project under the Linux Foundation, told TechCrunch. “We started hearing existential crises, and the whole conversation shifted from tokenmaxxing and ‘go fast’ to ‘we need guardrails, how do we get this under control?’”

The cries heard across the tech world followed fervent demands from CEOs pushing their teams to adopt the best models and move quickly, costs be damned. New models released in November, such as Claude Opus 4.5 from Anthropic, GPT-5.1 from OpenAI and Gemini 3 Pro from Google, brought significant improvements to agentic tools, increasing their consumption. It’s how one company reportedly found himself with a $500 million Claude bill after forgetting to set usage limits for employees.

“It’s like the crack cocaine epidemic,” said Chris Reed, senior director of IT finance at Priceline, noting that the company has started imposing symbolic limits on certain groups. “They made you try it to get you hooked, and now you’re kind of obligated to it.”

Vitaly Gordon, CEO of engineering operations platform Faros AI, said he recently spoke with a CTO who told him, “One of my engineers spent $40,000 on tokens last month, and I really don’t know whether to stop him or start telling everyone to look like him.”

One March questionnaire van Faros found that among 20,000 developers, production increased, but so did the number of bugs and rewrites. Jellyfish, an engineering management platform, similarly found that engineers who used the most tokens were about twice as productive as those who used less AI, but they spent ten times as many tokens to get there.

Nicholas Arcolano, head of research at Jellyfish, told TechCrunch via email that AI spending is exploding in large part due to agentic features, with consumption per developer increasing by about 18.6x in nine months. Overall, these statistics make the productivity issue murkier than the spending suggests.

“Whether extreme spending is worth it depends on the ultimate business value of the code shipped (e.g., revenue), which most companies still can’t measure,” Arcolano said.

At least part of that measurement problem is the sheer scale at which AI is being used today.

“Cloud cost tracking is a data problem with hundreds of millions of rows per month,” Storment said. “Tracking token costs is a data problem with trillions of rows per month. You can’t just put that in a spreadsheet or even a basic tool. You have to fundamentally rethink your tools, your specifications and your accounting systems to do that.”

At Priceline, Reed already sees discrepancies. He noticed issues between a supplier’s reported usage and Priceline’s internal data.

“I started my career in telecom spend management and I see all the same parallels, from telecom to cloud and AI,” he said. “Anytime you introduce something new, it’s ripe for billing errors and audit and optimization opportunities.”

A market is starting to form around this problem. There are pure-play companies, such as Pay-i, that track, measure and optimize the cost and performance of GenAI investments. Paid allows developers to track costs, measure usage, and bill users based on actual value instead of subscription fees.

Then there are companies like Jellyfish, Waydev, and Faros AI, all of which offer AI agent monitoring to prove the ROI of developer tools. Storment says most of the 180 vendors within the FinOps Foundation are leaning toward this space.

Companies with existing distribution are also adding new features to take advantage of this new market. Ramp recently moved AI spend management; Data hound And New relic have focused on services such as cloud cost management, token-level observability and GPU monitoring. At the FinOps X conference next week, AWS is expected to introduce new financial management features aimed at enterprise AI spend.

Tiffany Luck, a partner at NEA, thinks token efficiency and observability will likely be added to the “harness or app layer.” She pointed to Factory, a startup that makes AI agents for enterprises, which launched this week launched a model router that automatically chooses the right model for every task.

Gordon expects that border labs and other model providers will adopt OpenRouter-style optimization to drive searches to the cheapest models – a trend already visible on Claude’s accounts for businesses.

“The financial report for how much you spend on Anthropic, even if you call it the Opus model, some of the spending is going to go to Sonnet or Haiku because they’re smart enough to do it,” Gordan said. “I think this is going to become more and more of a thing.”

But all of these tools are built without a common language or shared definitions of how much a token costs, what it delivers, and how to compare spending from different vendors. That’s where the Tokenomics Foundation hopes to prove its worth.

The Foundation is building a canonical definition and framework for ‘tokenomics’; open standards, specifications and metrics for AI token usage and billing; as well as new metrics for AI economics, such as cost per intelligence or tokens per watt. It also plans to define metrics for token factory effectiveness and consumption efficiency. The group is planning a formal launch in July and is poised to announce more members at the FinOps X conference next week.

“Token economics is fundamentally more abstract and opaque than anything we have managed before at this scale,” Nishant Gupta, Chief Availability Officer at Salesforce, said in a statement. “It requires a different operating force than the one the industry built for the cloud.”

That said, Goldman Sachs projects Global token usage will increase 24 times by 2030. The companies that are already over budget need solutions now, and the foundation’s first results are still months away.

“Maybe we’ve created a steam engine, but we still haven’t figured out the assembly line,” Gordon said.

According to Arcolano, the smart move is broad, moderate adoption.

“The best ROI comes from moving the broad mid-range from low to moderate use, not pushing heavy users up,” he says.

Russell Brandom and Tim Fernholz contributed to this reporting.

When you make a purchase through links in our articles, we may earn a small commission. This does not affect our editorial independence.

Source link

The token bill comes due: Inside the industry scramble to manage AI’s runaway costs

Meta is testing an AI bedtime story app for people with no imagination

Jack Dorsey is taking on Slack with Buzz, a group chat platform for teams and their AI agents

Music streamer Deezer says more than 50% of daily uploads are AI-generated

A-Lister costs TV bosses millions after hit show shockfire

Astronauts return to ISS after sheltering during air leak repair attempt

Related Articles

OpenAI proposed donating 5% of its equity to a US sovereign wealth fund

OpenAI is coming for those sweet enterprise dollars in 2026

Sam Altman says ‘enough’ to questions about OpenAI’s revenue

Can an Apple lawsuit derail OpenAI’s hardware plans?

Meta is testing an AI bedtime story app for people with no imagination

Jack Dorsey is taking on Slack with Buzz, a group chat platform for teams and their AI agents

Music streamer Deezer says more than 50% of daily uploads are AI-generated