AI

Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

Anthropic released its latest model Fable on Tuesday, billing it as a public and limited version of its powerful and much-hyped cybersecurity model Mythos.

But not everyone is happy with the restrictions, and a number by cyber security researchers And professionals have broadcast complaints online.

“[Fable] rejects any request that could be tangentially cyber-related. Even innocuous tasks like reading a blog post,” said Valentina “Chompie” Palmiotti, a well-known security researcher working at IBM X-Force.

When a prompt activates the guardrails, Fable pauses the chat and says that its “security measures have flagged this message for cybersecurity or biology topics.”

The guardrails are in place to limit the risk that Fable can be used to develop malware or compromise software – a long-standing concern within Anthropic. The limitations on biology arise from a similar concern developing biological weapons.

When the AI ​​giant released Mythos in April, it limited the model to a limited number of companies and organizations in what it called Project Glasswing, an effort to deploy the model to secure critical software and infrastructure. Last week, Anthropic expanded access to Mythos to hundreds of organizations in 15 countries.

But despite the good intentions, many cybersecurity experts are still put off by the haphazard nature of the restrictions. Matt Suiche, a cybersecurity veteran, told TechCrunch that “if you ask it to write secure code, it assumes it’s cybersecurity-related work rather than software engineering best practices, and you get demoted.” Fable is programmed to fall back to Claude Opus 4.8 if it hits a guardrail. “It seems to be keyword-based, so anything in the lexical area of ​​’cybersecurity’ activates the guardrails.”

See also  At his OpenAI trial, Musk relitigates an old friendship

Contact us

Do you have more information about how hackers use AI? Or how cybersecurity companies use AI? We look forward to hearing from you. From a non-work device and network, you can securely contact Lorenzo Franceschi-Bicchierai on Signal at +1 917 257 1382, or via Telegram and Keybase @lorenzofb, or email.

“But it is understandable because we are still in the early days and they are still adjusting their guardrails. I am sure they will evolve over time as Anthropic and other frontier model companies will collaborate more with today’s new generation of cybersecurity companies,” said Suiche, a member of the technical staff of Tolmo, an AI cybersecurity startup. “It’s better to catch more people than not enough when you do a release like this and relax the guardrails over time.”

Another researcher grabbed on X that “even asking for a code review” triggers Fable’s guardrails.

Anthropic did not immediately respond to a request for comment.

Aside from guardrails within its models, Anthropic requires cybersecurity professionals to sign up for the Cyber ​​verification program. If approved, applicants will have fewer restrictions on using Claude for cybersecurity work. OpenAI has a similar program called Trusted access for cyber.

When you make a purchase through links in our articles, we may earn a small commission. This does not affect our editorial independence.

Source link

Back to top button