Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

August 17, 2025

2 2 minutes read

Has anthropic New possibilities announced That will enable some of his newest, largest models to terminate conversations in what the company describes as “rare, extreme cases of persistently harmful or offensive user interactions.” It is striking that anthropic says that it does this not to protect the human user, but rather the AI model itself.

To be clear, the company does not claim that its Claude AI models are aware or can be damaged by their conversations with users. In his own words, anthropic “very uncertain remains about the potential moral status of Claude and other LLMs, now or in the future.”

The announcement, however, points to a recent program that has been made to study what the ‘model welfare’ calls and says that anthropic in essence uses a just-in-case approach, “working on identifying and implementing cheap interventions to model the well-being, in the case of such well-being.”

The latter change is currently limited to Claude Opus 4 and 4.1. And again, it is only supposed to happen in “Extreme Edge Cases”, such as “Requests from users for sexual content with minors and attempts to request information that makes large violence or acts of terror possible.”

Although these types of requests may cause legal or publicity problems for anthropic self (witness recent report on how chatgpt could possibly strengthen or contribute to the delusions of its users), the company says that in pre-deployment tests, Claude Opus 4 a “strong preference to these requests”.

Regarding these new capacities for conversations, the company says: “In all cases, Claude is only to use his conversation capacity as the last resort when multiple attempts to divert have failed and the hope of a productive interaction has been exhausted, or if a user explicitly asks Claude to terminate a chat.”

Anthropic also says that Claude “is aimed at not using this ability in cases where users may run the risk of harming themselves or others.”

WAN event

San Francisco
|
27-29 October 2025

When Claude ends a conversation, Anthropic says that users will still be able to start new conversations from the same account and create new branches of the difficult conversation by editing their answers.

“We treat this function as a continuous experiment and will continue to refine our approach,” says the company.

Source link

Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

7 simple tips to prevent water damage for older homes

After Iran, will Trump turn his eyes to Cuba?

Deadly storms and tornadoes strike US state of Michigan

Creepy leaked videos from Bryan Kherberger in his grim prison chatter

Melania Trump wrote a letter to Vladamir Putin who fights for kidnapped children

Related Articles

Revolutionizing Agentic AI Customer Support with Autonomous Problem-Solving

The author of SB 1047 introduces a new AI bill in California

OpenAGI emerges from stealth with an AI agent that it claims crushes OpenAI and Anthropic

Agentic AI Is a Delicate Four-Way Dance Democratizing Access to Critical Business Insights

7 simple tips to prevent water damage for older homes

After Iran, will Trump turn his eyes to Cuba?

Deadly storms and tornadoes strike US state of Michigan