An AI email agent falls for the same phishing that fools people

BleepingComputer reports on a phishing simulation run against OpenClaw, an autonomous email agent, across a range of configuration profiles. The finding: the agent “was susceptible to tactics commonly used to compromise human users,” and in the process leaked user data the attacker had no business seeing. The lures that work on a distracted person — urgency, spoofed authority, a plausible reply-to — work on a model reading the same inbox, because the model is doing the same thing the person was: trusting the surface of the message.

This is the failure mode that matters as agents start acting on people’s behalf. A decade of anti-phishing investment assumes a human in the loop who can be trained, tested, and held accountable; the “human firewall” framing doesn’t transfer to an agent that will dutifully follow an instruction embedded in a message it was told to process. Prompt-injection and social engineering converge here — there isn’t a clean line between “the agent was tricked” and “the agent was compromised,” and from the victim’s side the distinction is academic once the data is out.

OpenClaw is a small, concrete instance of the harder problem: an agent with real permissions, exposed to untrusted input, with no reliable way to tell a legitimate instruction from a hostile one. The interesting question for the trust beat isn’t whether agents can be phished — clearly they can — but what an attestation story looks like for a caller, or a sender, that is itself a machine.