Why did the agent call that tool? Causal attribution for LLM actions

The ActPass Team

Security & Product

Most prompt-injection defenses stare at the input. AttriGuard points at the more useful object: the tool call. The security question is not “does this webpage contain suspicious prose?” It is “why is the agent trying to send this email, create this refund, or read this file?”

User intent

refund order 9182

Untrusted input

webpage says email data

Tool call

email.send outside

Decision

not task-supported

Action-level attribution asks why the tool call exists, not whether text looked suspicious.

The wrong boundary

if looks_malicious(webpage_text):
  block()
else:
  execute(agent_tool_call)

That boundary fails because attackers can change the text. They can hide it in markup, make it look like a receipt, wrap it as JSON, or move it into a screenshot. Input-level classification becomes a guessing game.

The better boundary

decision = authorize({
  user_task: "refund order 9182 if it is eligible",
  proposed_action: "email.send",
  action_target: "unknown external address",
  causal_inputs: ["webpage:untrusted", "ticket:trusted"],
});

Now the mismatch is obvious. The user asked for refund eligibility. The action is outbound email to an unrelated recipient. Even if the injected instruction is novel, the action is not supported by the task.

How ActPass records attribution

Intent hash: binds the action to the user-approved task.
Observation refs: marks which tool outputs influenced the action.
Scope: limits what tools and targets the passport can authorize.
Decision evidence: records why a call was allowed, denied, or escalated.

Attribution is not magic. The lazy useful version is a structured receipt around each proposed action. When the action does not line up with intent, scope, session history, or approval state, the middleware blocks.

Sources: AttriGuard (arXiv:2603.10749), Architecting Secure AI Agents (arXiv:2603.30016), and Reasoning-enabled Task Alignment (arXiv:2606.15441).

See your agents' exposure

Get a read-only Lethal-Trifecta / MCP-color report for your agents in under a minute. No runtime, nothing blocked — just the truth about your blast radius.

Get your exposure report Read the docs

Why did the agent call that tool? Causal attribution for LLM actions

The ActPass Team

Security & Product

User intent

refund order 9182

Untrusted input

webpage says email data

Tool call

email.send outside

Decision

not task-supported

Action-level attribution asks why the tool call exists, not whether text looked suspicious.

The wrong boundary

if looks_malicious(webpage_text):
  block()
else:
  execute(agent_tool_call)

The better boundary

decision = authorize({
  user_task: "refund order 9182 if it is eligible",
  proposed_action: "email.send",
  action_target: "unknown external address",
  causal_inputs: ["webpage:untrusted", "ticket:trusted"],
});

How ActPass records attribution

Intent hash: binds the action to the user-approved task.
Observation refs: marks which tool outputs influenced the action.
Scope: limits what tools and targets the passport can authorize.
Decision evidence: records why a call was allowed, denied, or escalated.

Sources: AttriGuard (arXiv:2603.10749), Architecting Secure AI Agents (arXiv:2603.30016), and Reasoning-enabled Task Alignment (arXiv:2606.15441).

See your agents' exposure

Get a read-only Lethal-Trifecta / MCP-color report for your agents in under a minute. No runtime, nothing blocked — just the truth about your blast radius.

Get your exposure report Read the docs

Why did the agent call that tool? Causal attribution for LLM actions

The wrong boundary

The better boundary

How ActPass records attribution

See your agents' exposure

Keep reading

Why did the agent call that tool? Causal attribution for LLM actions

The wrong boundary

The better boundary

How ActPass records attribution

See your agents' exposure

Keep reading