AI

Anthropic’s Claude Opus 4.5 is here: Cheaper AI, infinite chats, and coding skills that beat humans

Anthropic released its most capable artificial intelligence model yet on Monday, slashing prices by about two-thirds while claiming state-of-the-art performance in software engineering — a strategic move that intensifies the AI ​​startup’s competition with entrenched rivals OpenAI and Google.

The new model, Claude Opus 4.5scored higher on Anthropic’s most challenging internal technical assessment than any human candidate in the company’s history, according to materials reviewed by VentureBeat. The result underscores both the rapidly advancing capabilities of AI systems and growing questions about how the technology will reshape white-collar professions.

The Amazon-backed company is touting Claude Opus 4.5 Import tokens of $5 per million And Export tokens of $25 per million – a dramatic reduction from the $15 and $75 rates for its predecessor, Claude Opus 4.1released earlier this year. This move opens up groundbreaking AI capabilities to a broader group of developers and enterprises, while putting pressure on competitors to match both performance and prices.

“We want to make sure this really works for people who want to work with these models,” said Alex Albert, Anthropic’s head of developer relations, in an exclusive interview with VentureBeat. “That’s really our focus: How can we make Claude better able to help you do the things that you don’t necessarily want to do in your job?”

The announcement comes as Anthropic races to maintain their position in an increasingly crowded field. OpenAI was recently released GPT-5.1 and a specialized encryption model called CodexMax who can work autonomously for a longer period of time. Google revealed Gemini 3 just last week, even with OpenAI there are concerns about the search giant’s progress, according to a recent report from The Information.

Opus 4.5 demonstrates improved judgment on real-world tasks, developers say

Anthropic’s internal testing showed what the company describes as a qualitative leap in Claude Opus 4.5’s reasoning ability. The model achieved an accuracy of 80.9% SWE bank verifieda benchmark that measures real-world software engineering tasks and outperforms OpenAI’s Sonnet 4.5 (77.2%) and Google’s Gemini 3 Pro (76.2%), according to the company’s data.

But the technical benchmarks only tell part of the story. Albert said employee testers consistently reported that the model shows better judgment and intuition across a variety of tasks – a shift he described as the model developing a sense of what’s important in the real world.

“The model just gets it,” Albert said. “It’s just developed this kind of intuition and judgment about a lot of real-world things that qualitatively feels like a big leap forward over previous models.”

He mentioned his own workflow as an example. Previously, Albert said he asked AI models to gather information, but was hesitant to rely on their synthesis or prioritization. With Opus 4.5, he delegates more complete tasks and links them to Slack and internal documents to produce coherent summaries that align with his priorities.

See also  Microsoft announces glut of new data centers but says it won't let your electricity bill go up

Opus 4.5 outperforms all human candidates on the company’s toughest technical test

The model’s performance on Anthropic’s internal technical assessment marks a notable milestone. The take-home exam, intended for potential performance engineering candidates, is designed to evaluate technical competence and judgment under time pressure within a prescribed limit of two hours.

Using a technique called parallel test-time compute – which merges multiple attempts from the model and selects the best result – Opus 4.5 scored higher than any human candidate who took the test, according to the company. With no time limit, the model matched the performance of the best human candidate ever when used in Claude Code, Anthropic’s coding environment.

The company acknowledged that the test does not measure other crucial professional skills, such as collaboration, communication or the instincts that develop through years of experience. Still, Anthropic says the result “raises questions about how AI will change engineering as a profession.”

Albert emphasized the importance of the finding. “I think this is perhaps a sign of how useful these models can actually be in a work context and for our jobs,” he said. “Of course this was a technical task, and I would say that modeling is relatively at the forefront in the technical field compared to other fields, but I think it is a very important signal to pay attention to.”

Dramatic efficiency improvements have reduced token usage by up to 76% on key benchmarks

Beyond raw performance, Anthropic is betting that efficiency improvements will stand out Claude Opus 4.5 on the market. The company says the model uses dramatically fewer tokens – the text units that AI systems process – to achieve similar or better results compared to its predecessors.

At an average effort level, Opus 4.5 corresponds to the previous one Sonnet 4.5 model’s best score on SWE-bench Verified while using 76% fewer output tokens, according to Anthropic. At the highest effort level, Opus 4.5 exceeds Sonnet 4.5’s performance by 4.3 percentage points, while still using 48% fewer tokens.

To give developers more control, Anthropic introduced an “effort parameter” that allows users to adjust how much computation the model applies to each task – balancing performance against latency and cost.

Enterprise customers provided early validation of the efficiency claims. “Opus 4.5 beats Sonnet 4.5 and the competition on our internal benchmarks, using fewer tokens to solve the same problems,” said Michele Catasta, president of Replit, a cloud-based coding platform, in a statement to VentureBeat. “At scale, that efficiency increases.”

See also  6 Essential Leadership Skills to Help Agents Thrive in Any Market

GitHub’s Chief Product Officer, Mario Rodriguez, said early testing shows Opus 4.5 “exceeds internal coding benchmarks while reducing token usage by half, and is especially suitable for tasks like code migration and code refactoring.”

Early customers report AI agents that learn from experience and refine their own skills

One of the most notable capabilities demonstrated by early customers involves what Anthropic calls “self-improving agents”: AI systems that can fine-tune their own performance through iterative learning.

Rakutenthe Japanese e-commerce and internet company, tested Claude Opus 4.5 for the automation of office tasks. “Our agents were able to autonomously refine their own capabilities and achieved peak performance in four iterations, while other models couldn’t match that quality after ten iterations,” said Yusuke Kaji, Rakuten’s general manager of AI for Business.

Albert explained that the model does not update its own weights – the fundamental parameters that determine an AI system’s behavior – but rather iteratively improves the tools and approaches it uses to solve problems. “It was iteratively refining a skill for a task and seeing it try to optimize the skill to get better performance so it could accomplish this task,” he said.

The possibilities go beyond coding. Albert said Anthropic has seen significant improvements in creating professional documents, spreadsheets and presentations. “They say this is the biggest jump they’ve seen between model generations,” Albert said. “So even if you go from Sonnet 4.5 to Opus 4.5, the jump is bigger than any two models in the past.”

Basic research laboratoriesa financial modeling agency, reported that “the accuracy of our internal evaluations improved by 20%, efficiency increased by 15%, and complex tasks that once seemed out of reach became achievable,” according to co-founder Nico Christie.

New features target Excel users and Chrome workflows and eliminate chat length limits

In addition to the model release, Anthropic has rolled out a series of product updates for business users. Claude for Excel became generally available to Max, Team, and Enterprise users with new support for pivot tables, charts, and file uploads. The Chrome browser extension is now available to all Max users.

Perhaps most importantly, Anthropic introduced “endless chats” – a feature that eliminates the limitations of context windows by automatically summarizing previous parts of conversations as they get longer. “Within Claude AI, within the product itself, you effectively get these kind of infinite context windows because of the compaction, plus some of the memory stuff that we do,” Albert explains.

See also  Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required

For developers, Anthropic has released programmatic tool calling, which allows Claude to write and run code that calls functions directly. Claude Code received an updated “Plan Mode” and became available on the desktop in research preview, allowing developers to run multiple AI agent sessions in parallel.

The market is surging as OpenAI and Google race to align performance and pricing

Anthropically achieved $2 billion in annual revenue during the first quarter of 2025, more than doubling from the $1 billion in the previous period. The number of customers spending more than $100,000 annually has increased eightfold in the past year.

The rapid release of Opus 4.5 – just a few weeks later Haiku 4.5 in October and Sonnet 4.5 in September – reflects broader sector dynamics. OpenAI released multiple GPT-5 variants in 2025, including a specialized one Codex Max model in November it can work autonomously for up to 24 hours. Google released Gemini 3 in mid-November after months of development.

Albert attributed Anthropic’s accelerated pace in part to using Claude to accelerate his own development. “We see a lot of help and acceleration from Claude himself, whether it’s the actual product development or the model research,” he said.

The price reduction for Opus 4.5 could put pressure on margins while expanding the addressable market. “I expect that many startups will incorporate this much more into their products and promote it prominently,” says Albert.

Yet profitability remains elusive for leading AI labs as they invest heavily in computing infrastructure and research talent. The The AI ​​market is expected to reach more than $1 trillion in revenue within a decade, but no vendor has achieved a dominant market position – even as models reach a threshold where they can meaningfully automate complex knowledge work.

Michael Truell, CEO of Cursor, an AI-powered code editor, called Opus 4.5 “a notable improvement over the previous Claude models within Cursor, with improved pricing and intelligence for difficult coding tasks.” Scott Wu, CEO of AI coding startup Cognition, said the model “delivers stronger results on our toughest assessments and consistent performance through 30-minute autonomous coding sessions.”

For enterprises and developers, competition translates into rapidly improving capabilities at falling prices. But as AI’s performance on technical tasks approaches (and sometimes even exceeds) human expert levels, the technology’s impact on professional work becomes less theoretical.

When asked about the engineering exam results and what they indicate about the trajectory of AI, Albert was direct: “I think this is a very important signal to pay attention to.”

Source link

Back to top button