The plan/policy/enforcer pattern for secure AI agents

The ActPass Team

Security & Product

The cleanest architecture in the system-level defense papers is simple: separate planning from permission. Let the model propose a plan. Let policy define what the plan is allowed to do. Let an enforcer approve or block the concrete action before execution.

Prompt text can influence the agent, but ActPass checks the proposed action before the tool runs.

The anti-pattern

plan = llm("solve the task")
for step in plan:
  tool.execute(step) // the model is effectively root

This is convenient and unsafe. The agent's context includes untrusted feedback, so any later plan update can smuggle in a new permission model.

The pattern

plan = llm.create_plan(task)
policy = actpass.issue_passport({
  task,
  scopes: ["ticket.read", "refund.create"],
  limits: { refund_cents: 5000 },
  ttl: "1h",
});

for action of plan.actions:
  decision = actpass.authorize(policy, action);
  if (decision.status !== "allow") stop(decision);
  tool.execute(action);

Dynamic replanning still works. The agent can react to new facts. What it cannot do is mint new authority for itself after hostile content enters the loop.

Where ActPass sits

Before execution: deterministic allow/deny/needs-approval.
During execution: nonce, TTL, scope, target, and session-capability checks.
After execution: evidence receipt for audits and incident review.

This is why ActPass is infrastructure, not a prompt. It is the enforcement layer the model has to pass through when text becomes action.

Sources: Architecting Secure AI Agents (arXiv:2603.30016), Reasoning-enabled Task Alignment (arXiv:2606.15441), and AttriGuard (arXiv:2603.10749).

See your agents' exposure

Get a read-only Lethal-Trifecta / MCP-color report for your agents in under a minute. No runtime, nothing blocked — just the truth about your blast radius.

Get your exposure report Read the docs

The plan/policy/enforcer pattern for secure AI agents

The ActPass Team

Security & Product

Prompt text can influence the agent, but ActPass checks the proposed action before the tool runs.

The anti-pattern

plan = llm("solve the task")
for step in plan:
  tool.execute(step) // the model is effectively root

This is convenient and unsafe. The agent's context includes untrusted feedback, so any later plan update can smuggle in a new permission model.

The pattern

plan = llm.create_plan(task)
policy = actpass.issue_passport({
  task,
  scopes: ["ticket.read", "refund.create"],
  limits: { refund_cents: 5000 },
  ttl: "1h",
});

for action of plan.actions:
  decision = actpass.authorize(policy, action);
  if (decision.status !== "allow") stop(decision);
  tool.execute(action);

Dynamic replanning still works. The agent can react to new facts. What it cannot do is mint new authority for itself after hostile content enters the loop.

Where ActPass sits

Before execution: deterministic allow/deny/needs-approval.
During execution: nonce, TTL, scope, target, and session-capability checks.
After execution: evidence receipt for audits and incident review.

This is why ActPass is infrastructure, not a prompt. It is the enforcement layer the model has to pass through when text becomes action.

Sources: Architecting Secure AI Agents (arXiv:2603.30016), Reasoning-enabled Task Alignment (arXiv:2606.15441), and AttriGuard (arXiv:2603.10749).

See your agents' exposure

Get a read-only Lethal-Trifecta / MCP-color report for your agents in under a minute. No runtime, nothing blocked — just the truth about your blast radius.

Get your exposure report Read the docs

The plan/policy/enforcer pattern for secure AI agents

The anti-pattern

The pattern

Where ActPass sits

See your agents' exposure

Keep reading

The plan/policy/enforcer pattern for secure AI agents

The anti-pattern

The pattern

Where ActPass sits

See your agents' exposure

Keep reading