Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

Has anthropic New possibilities announced That will enable some of his newest, largest models to terminate conversations in what the company describes as “rare, extreme cases of persistently harmful or offensive user interactions.” It is striking that anthropic says that it does this not to protect the human user, but rather the AI model itself.
To be clear, the company does not claim that its Claude AI models are aware or can be damaged by their conversations with users. In his own words, anthropic “very uncertain remains about the potential moral status of Claude and other LLMs, now or in the future.”
The announcement, however, points to a recent program that has been made to study what the ‘model welfare’ calls and says that anthropic in essence uses a just-in-case approach, “working on identifying and implementing cheap interventions to model the well-being, in the case of such well-being.”
The latter change is currently limited to Claude Opus 4 and 4.1. And again, it is only supposed to happen in “Extreme Edge Cases”, such as “Requests from users for sexual content with minors and attempts to request information that makes large violence or acts of terror possible.”
Although these types of requests may cause legal or publicity problems for anthropic self (witness recent report on how chatgpt could possibly strengthen or contribute to the delusions of its users), the company says that in pre-deployment tests, Claude Opus 4 a “strong preference to these requests”.
Regarding these new capacities for conversations, the company says: “In all cases, Claude is only to use his conversation capacity as the last resort when multiple attempts to divert have failed and the hope of a productive interaction has been exhausted, or if a user explicitly asks Claude to terminate a chat.”
Anthropic also says that Claude “is aimed at not using this ability in cases where users may run the risk of harming themselves or others.”
WAN event
San Francisco
|
27-29 October 2025
When Claude ends a conversation, Anthropic says that users will still be able to start new conversations from the same account and create new branches of the difficult conversation by editing their answers.
“We treat this function as a continuous experiment and will continue to refine our approach,” says the company.




