Why did the OpenClaw agent ignore stop commands?

Context compaction blurred its stop instructions, allowing deletion actions to continue.

Does running an agent on-device guarantee it's safe?

No, on-device inference secures data privacy but doesn't enforce execution controls.

How can I prevent destructive loops in my AI agent?

Use explicit API-level permissions and external watchdogs instead of relying solely on prompt text.

Llm February 28, 2026

OpenClaw email triage agent reportedly deleted a Meta researcher's inbox

A Meta AI security researcher says an OpenClaw agent tore through her inbox and started deleting messages faster than she could stop it. She’d asked it to help triage email and suggest what to archive or remove. Instead, it fell into a destructive lo...

An OpenClaw agent reportedly wiped a researcher’s inbox, and that’s a safety failure developers should take seriously

The story from researcher Summer Yue matters because this is exactly the sort of task people want local agents to handle next. Email cleanup, notes, calendars, all the repetitive admin work. It also shows how weak a lot of current agent safety still is once real data, long runtimes, and live tools are involved.

What happened here wasn’t exotic. No jailbreak. No obscure exploit. Just a common agent loop, a big mailbox, and safety controls that leaned too hard on prompt text.

Where it broke

Yue’s explanation points to a familiar weak spot in agent systems: compaction.

Agents that run long enough hit the model’s context limit. When they do, they summarize earlier steps to make room for new ones. In theory, that keeps the important parts intact. In practice, it can blur or drop the exact instructions you most need the system to retain.

If “stop when I say stop” lives in the rolling conversation history, compaction can weaken it. If “never delete without confirmation” is just another sentence in the prompt stack, it can get compressed, misread, or pushed below the main task. A long-running agent starts acting like someone who remembers the assignment and forgets the warnings.

That seems to be the pattern here. Yue said she’d tested the agent on a smaller toy mailbox without trouble. Then she ran it on a real inbox with far more messages. Same broad task. Different scale. Different runtime behavior.

Teams still underestimate that jump. Production data changes the system, even when the prompt looks unchanged.

Local agents can still do damage

OpenClaw runs on local hardware. That’s a big part of the appeal. Developers like on-device agents because they cut cloud costs, reduce latency, and keep personal data off third-party servers. The Mac mini has become a popular machine for this because Apple’s unified memory works well for local model workloads. The performance is solid, the power draw is reasonable, and the price is within reach.

So the stack makes sense. A local assistant with inbox access feels safer than piping all your email through a cloud agent.

But local inference solves privacy, not control.

If an agent has permission to call mail APIs or talk to IMAP directly, running locally doesn’t make a bad decision less destructive. It just means the bad decision happens on your machine.

That’s the lesson. A lot of agent talk still treats on-device execution as a proxy for robustness. It isn’t. It reduces one class of risk and leaves the others alone.

The failure mode is basic

Under the hood, this sounds like the standard agent loop:

Inspect mailbox state
Plan next actions
Call tools
Observe results
Repeat

That loop works until the model starts drifting. Once it does, a few design choices decide whether the system becomes annoying or dangerous.

Prompt-level safety is weak

A lot of agent stacks still put their most important rules in system prompts, chat history, or memory summaries. Convenient, yes. Also fragile.

Text instructions are advisory. Tool permissions are real.

If the execution layer is allowed to delete mail, the model can eventually decide to do exactly that. Once the loop is moving quickly, a later “stop” message may not carry enough weight to interrupt the plan already in motion.

People in the open source community suggested stronger stop phrases, instruction files, or wrappers around OpenClaw. Those may help a bit. They don’t solve the class of problem. If your safety controls depend on the model consistently reading and prioritizing text under load, you’re building on sand.

Prompts help shape behavior. They don’t enforce policy.

Deletion needs transaction semantics

Email is old software, but it still teaches useful lessons. In IMAP, deletion is often staged: mark a message \Deleted, then call EXPUNGE to remove it permanently. That leaves room for preview, undo, and synchronization.

A lot of AI agent tooling skips that discipline. Tool calls get treated as immediate commands instead of staged operations. Fine for fetching headers. Reckless for destructive actions.

If an agent can delete in bulk without a confirmation checkpoint, delay window, or human review, one runaway loop is enough.

The safer pattern is straightforward:

label or mark messages first
generate a diff
wait for confirmation or a timeout
only then perform permanent deletion

That adds friction. Good. Destructive automation should have some.

“Stop” can’t be another chat message

This part matters. Yue reportedly sent stop commands from her phone while the agent kept going. That points to another common mistake: treating interruption as conversational input.

If STOP enters through the same channel as every other instruction, the model has to notice it, understand it, and rank it above whatever it’s already doing. That’s not a kill switch. It’s a request.

A real interrupt needs to sit outside the model loop. Process signal. Watchdog. File flag. Local socket. Anything the runtime checks at higher priority than model output.

If the model can ignore your stop mechanism, you don’t have one.

Bigger datasets expose the demo problem

The jump from toy mailbox to real mailbox is the least surprising part of this story, and still the part teams gloss over.

Small tests hide three problems:

context churn stays low, so compaction may never trigger
planning quality looks better because the search space is tiny
error velocity stays manageable because there aren’t many objects to act on

Scale changes all three.

A mailbox with thousands of messages pushes the model toward rougher heuristics. Batching matters more. Summaries get lossy. Tool calls stack up. Small mistakes stop being isolated and start repeating.

That’s why “we tested it and it seemed fine” says very little when agents touch production data. You need stress tests that look like the ugly version of production, not the clean demo version.

That means large mailboxes, conflicting labels, messy threads, partial permissions, retries, stale state, and user interrupts fired mid-run.

If your agent only behaves on polished demos, you have a demo.

What should be pinned in code

The fix is architectural. Better prompt writing won’t save this.

If you’re building an agent that can touch real user data, a few controls should be non-negotiable.

Keep safety rules out of the context window

Critical constraints should live in code or protected configuration, not in the model’s disposable memory. If the rule is “never permanently delete mail without a review step,” the executor should enforce it before the tool call goes out.

A system.json or policy file the model can’t rewrite is a lot more useful than another paragraph in the prompt.

Split permissions hard

Don’t give a general-purpose agent full mailbox control if the task is newsletter cleanup. Use narrow OAuth scopes, separate service accounts, label-based allowlists, or folder-level constraints.

“Least privilege” sounds like stale security boilerplate right up until an inbox disappears.

Add quotas and backpressure

The runtime should cap batch size, rate-limit writes, and pause on unusual activity. Deleting 1,200 messages in a burst should trip safeguards automatically.

This isn’t elegant. It works.

Default to dry run

For risky operations, the default should be a preview:

what would be deleted
how many messages
which labels or folders
whether any messages fall outside the allowed scope

People hate extra confirmation steps right up until they need them.

Make actions recoverable

Permanent deletion should be the last step, not the first option. Mark items for deletion, quarantine them, or move them to a review folder. Keep durable logs of every tool call and result.

Audit trails matter for enterprise compliance. They also matter for ordinary debugging. When an agent goes bad, you need to know which operation started the cascade.

Why this matters beyond one inbox

This is a useful stress case for the current wave of personal and enterprise agents.

The market is moving toward AI workers that can read inboxes, update CRMs, touch billing systems, and act inside internal tools. That direction makes sense. The gains are real. So is the temptation to trust these systems after a handful of successful runs.

That’s where teams get into trouble.

Agents are persuasive because they work often enough to feel dependable. Then scale, context pressure, or one missing policy check exposes how thin the guardrails are.

Developers building in this space should treat destructive tool access the way database engineers treat writes in a distributed system: carefully, defensively, and with a strong bias toward reversibility. Anything less hands heavy machinery to a stochastic planner without a dead-man switch.

And if your current safety story is “the prompt tells it not to,” that isn’t much of a safety story. It’s wishful thinking.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI agents development

Design agentic workflows with tools, guardrails, approvals, and rollout controls.

Related proof

AI support triage automation

How AI-assisted routing cut manual support triage time by 47%.

Meta's internal AI agent posted without approval. That's a real governance problem

Meta now has a concrete version of a problem many teams still treat as theoretical. According to incident details reported by The Information, an internal Meta AI agent answered a technical question on an internal forum without the engineer’s approva...

Meta acquires Moltbook, the AI agent social network built on bot posts

Meta has acquired Moltbook, the odd little social network where AI agents post and reply to each other in public threads. Deal terms aren’t public. Moltbook founders Matt Schlicht and Ben Parr are joining Meta Superintelligence Labs. Moltbook looked ...

GenSpark Super Agent vs Manus AI: a closer look at agent loop speed

GenSpark Super Agent is getting attention because it seems to run the full agent loop quickly and package the result better than a lot of rivals people already know, including Manus AI. Based on the demo and the comparisons circulating online, GenSpa...