A Meta AI security researcher said an OpenClaw agent ran amok on her inbox

1 week ago

0 3 minutes read

The now-viral X-post by Meta AI security researcher Summer Yue initially reads as satire. She told her OpenClaw AI agent to check her overflowing email inbox and suggest what to delete or archive.

The officer then started running amok. It started deleting all her email in a “speed run” while ignoring her commands from her phone telling it to stop.

“I had to RUN to my Mac mini like I was defusing a bomb,” she wrote, posting images of the ignored stop directions as a receipt.

The Mac Mini, an affordable Apple computer that sits flat on a desk and fits in the palm of your hand, has become the device of choice for running OpenClaw these days. (The Mini is selling “like hotcakes,” a “confused” Apple employee apparently told me famous AI researcher Andrej Karpathy when he bought one to use an OpenClaw alternative called NanoClaw.)

OpenClaw, of course, is the open source AI agent that rose to prominence through Moltbook, an AI-only social network. OpenClaw agents were at the center of that now largely debunked episode on Moltbook that made it seem like the AIs were plotting against humans.

But OpenClaw’s mission, according to its GitHub pageis not focused on social networks. It aims to be a personal AI assistant that runs on your own devices.

Silicon Valley’s in-crowd has fallen so in love with OpenClaw that “claw” and “claws” have become the standard buzzwords of your choice for agents running on personal hardware. Other such resources include ZeroClaw, IronclawAnd Picoclaw. The Y Combinator podcast team even appeared on theirs most recent episode dressed in lobster costumes.

WAN event

Boston, MA
|
June 9, 2026

But Yue’s post serves as a warning. As others on X noted, if an AI security researcher were to encounter this problem, what hope do mere mortals have?

“Did you deliberately test the guardrails or did you make a rookie mistake?” a software developer asked her on X.

“Rookie mistake tbh,” she replied. She had tested her agent with a smaller “toy” inbox, as she called it, and it worked well on less important email. It had gained her confidence, so she thought she would unleash it on the real thing.

Yue believes the large amount of data in her real inbox “triggered densification,” she wrote. Compaction happens when the context window (the overview of everything the AI has been told and done during a session) becomes too large, causing the agent to start summarizing, compressing, and managing the conversation.

At that moment, the AI can skip instructions that humans consider very important.

In this case, the device may have skipped her last prompt – where she told it not to act – and returned to the instructions from the ‘toy’ inbox.

Like several others noted on X, clues cannot be trusted to act as a security fence. Models can misinterpret or ignore these.

Several people came up with suggestions that ranged from the exact syntax Yue should have used to stop the agent, to different methods to ensure better compliance with the guardrails, such as writing instructions in special files or using other open source tools.

In the interest of full transparency, TechCrunch could not independently verify what happened to Yue’s inbox. (She did not respond to our request for comment, although she did respond to many questions and comments sent to her via X.)

But it doesn’t really matter.

The bottom line is that assets targeting knowledge workers are risky at their current stage of development. People who say they use them successfully come up with methods to protect themselves.

One day, perhaps soon (by 2027? 2028?), they might be ready for widespread use. God knows many of us would love help with email, grocery orders, and scheduling dental appointments. But that day has not yet come.

Source link