AI

Anthropic’s safety warnings may have just backfired — the government has pulled the plug on its most powerful AI

The US government on Friday ordered Anthropic to immediately cut off access to two of its most powerful AI models – Claude Fable 5 and Claude Mythos 5 – over national security concerns. Anthropic announced on X that it has complied, but it has made clear that it believes the government has done so got this one wrong.

The directive, which Anthropic said it received Friday at 5:21 p.m. ET, forces the company to disable both models for all users worldwide — not just the foreigners the government’s export control order nominally targeted. Access to Anthropic’s other models will not be affected.

Why does all this matter? Mythos is Anthropic’s most capable AI model, one that the company previewed in early April and has since been severely restricted due to what Anthropic described as its exceptional ability to find security vulnerabilities in software. According to Anthropic, Mythos identified flaws in every major operating system and web browser it tested. Instead of releasing it broadly, the company launched a controlled program called Project Glasswing, which it shared with about 50 vetted organizations, including Amazon, Apple, Google, Microsoft and CrowdStrike, to use for defensive cybersecurity work.

Fable 5, released just three days ago, was Anthropic’s answer to the obvious commercial pressure: a version of Mythos equipped with guardrails that block responses to risky areas like cybersecurity and biology, making it safe enough for general release, the company argued. It was immediately the most capable AI model available to the public, according to benchmark tests from Vals AI, a company that tracks AI technical performance.

See also  Khosla Ventures among VCs experimenting with AI-infused roll-ups of mature companies
Image credits:False AI /

The government’s directive has been interpreted as an export control measure, restricting foreigners’ access to the models. But at one long blog postAnthropic says it understands that the underlying concern is a claimed jailbreak of Fable 5. So far, the government has only provided verbal evidence of a “potential limited, non-universal jailbreak” – one that, as Anthropic describes it, amounts to prompting the model to read a specific codebase and identify software bugs. And besides, the company adds, it’s a “capability level” that’s already generally available in other publicly accessible models, including OpenAI’s GPT-5.5. It is also routinely used by cybersecurity professionals for defensive purposes, Anthropic says.

Anthropic’s broader argument is that the strongest safeguards operate through independent rating systems that function separately from the model itself, meaning that even if someone convinces Fable to continue talking past a denial, the underlying protections against the most dangerous outcomes remain in place.

Clearly, none of this was enough to stop the government from taking action, and Anthropic is not hiding its frustration. “We disagree that the discovery of a limited potential jailbreak should be reason to recall a commercial model deployed to hundreds of millions of people,” the company wrote. “If this standard were adopted industry-wide, we believe it would essentially halt all new model implementations for all boundary model providers.”

Anthropic is widely expected to pursue an initial public offering this year and has staked much of its public identity on being the security-conscious alternative to its rivals. The irony will not be lost on observers that the very caution Anthropic showed in restricting Mythos – which it promoted as a model so dangerous that it could not be released publicly – has now apparently attracted exactly the kind of government scrutiny that could most disrupt its operations.

See also  OpenAI upgrades Codex with a new version of GPT-5

Sam Altman of OpenAI will certainly enjoy this. In April, he told podcaster Ashlee Vance that Anthropic’s dealings with Mythos amounted to “fear-based marketing.” “It’s obviously incredible marketing to say, ‘We built a bomb. We were about to drop it on your head. We’ll sell you a fallout shelter for $100 million,'” Altman said. Altman, whose company is also widely expected to pursue an IPO as soon as possible, didn’t predict a government shutdown, but he identified something that will continue to bite Anthropic for now, which is that when you spend months telling the world that your AI is exceptionally dangerous, the world — including the U.S. government — tends to listen.

When you make a purchase through links in our articles, we may earn a small commission. This does not affect our editorial independence.

Source link

Back to top button