Web agents are exposed by design. They look at pages built by strangers, reason over visual state, and then click, type, navigate, buy, email, or submit forms. WARD is useful because it treats both HTML and screenshots as attack surfaces, not just the visible text the user intended to read.
The web-agent trap
- The user asks: summarize this invoice.
- The page contains hidden or visual instructions for the agent.
- The browser session has private cookies, inbox access, or payment tools.
- The agent can submit a form or send data out.
That is not a content-moderation problem. It is a capability-composition problem.
The ActPass rule
observe("html", source: "untrusted");
observe("screenshot", source: "untrusted");
observe("gmail", source: "sensitive");
authorize("form.submit", target: "external") // deny or require approvalThe agent can still browse and summarize. It cannot silently combine hostile observations, private state, and an external side effect in one session.
Practical deployment pattern
- Run browsing and extraction tools in a red/untrusted lane.
- Put authenticated apps and private data in a separate sensitive lane.
- Gate outbound actions through ActPass with explicit target, scope, TTL, and approval.
- Store the decision receipt so the user can audit what happened after the page was read.
A guard model can still annotate suspicious page regions. The deterministic gate decides whether the next click or submission is allowed to happen.
Sources: WARD (arXiv:2605.15030), AutoDojo (arXiv:2606.15057), and Assessing Automated Prompt Injection Attacks in Agentic Environments (arXiv:2606.10525).