2025-12-11 · Authensor

Zero-Trust Agent Architecture

Zero-trust agent architecture eliminates implicit trust between AI agents, services, and resources — every action is authenticated, authorized against an explicit policy, and logged, regardless of the agent's identity, location, or previous behavior.

Problem Statement

Traditional security models assign trust based on network location or identity: agents inside the trusted perimeter can access resources freely; agents outside cannot. This model breaks for AI agents. An agent running inside a trusted environment can be prompt-injected, hallucinate destructive commands, or exhibit emergent behavior that violates safety constraints. Internal network position does not indicate trustworthiness. Previous good behavior does not predict future actions. An agent that has safely executed 10,000 file reads can still attempt to exfiltrate credentials on the 10,001st action. Trust must be verified on every action, not assumed from context.

Solution

Zero-trust architecture, originally defined by Forrester Research and formalized by NIST SP 800-207, is built on three principles: never trust, always verify; assume breach; minimize blast radius. Applied to AI agent systems, these principles translate to specific architectural requirements.

Principle 1: Never trust, always verify. Every action an agent attempts is evaluated against a policy, regardless of the agent's identity, role, or history. There is no trusted agent that bypasses policy evaluation. There is no "safe" action type that skips verification. Every file_write, file_read, shell_exec, and network request passes through the policy engine.

Principle 2: Assume breach. The architecture assumes that any agent can be compromised at any time through prompt injection, model failure, or adversarial input. Security controls are designed to limit the damage a compromised agent can inflict. The audit trail records every action for forensic analysis. Policies are scoped to minimize the blast radius.

Principle 3: Minimize blast radius. Each agent receives the minimum permissions necessary for its task (least privilege). Agents are isolated from each other (per-agent isolation). Sensitive resources are segmented behind separate policy rules. A compromised agent can only affect the resources its narrow policy permits.

The zero-trust agent architecture consists of five components:

  1. Agent identity verification. Each agent is assigned a unique, non-spoofable identity at initialization. The identity is included in every action request and verified by the policy engine. Unknown agents are denied all actions.
  1. Per-action policy evaluation. Every action is evaluated against the policy engine. No action is exempted from evaluation. The policy uses deny-by-default: unrecognized actions are denied.
  1. Least-privilege policies. Each agent's policy grants only the minimum permissions required. Broad wildcards and permissive patterns are avoided.
  1. Continuous logging and verification. Every action evaluation is recorded in a tamper-proof audit trail. The audit trail enables continuous verification of agent behavior, anomaly detection, and compliance reporting.
  1. No implicit trust inheritance. An agent that is permitted to read files is not implicitly permitted to write files. An agent permitted to access one network endpoint is not implicitly permitted to access others. Each permission is explicitly granted through a distinct policy rule.
The zero-trust model contrasts with perimeter-based security, where agents inside the trusted network (or inside a container, or running under a trusted user account) are implicitly trusted. In zero-trust, the container and network position are additional defense layers, but they do not substitute for per-action policy evaluation.

Implementation

SafeClaw, by Authensor, implements zero-trust agent architecture through the combination of several design properties:

Per-action evaluation. SafeClaw evaluates every action — file_write, file_read, shell_exec, network — against the policy engine. No action type is exempted. The evaluation is mandatory and cannot be bypassed by the agent.

Deny-by-default. SafeClaw's policy engine returns DENY for any action that does not match an explicit allow rule. This ensures no implicit permissions exist. Every permission is explicitly declared.

Agent identity in every request. Every action request includes an agent field. SafeClaw routes the request to the agent-specific policy. Actions from unrecognized agents are denied.

Per-agent policy isolation. Each agent receives its own policy file with its own allow rules. One agent's permissions do not extend to another agent.

Tamper-proof audit trail. Every evaluation is recorded in a SHA-256 hash chain. The audit trail provides the continuous verification component of zero trust: any agent's behavior can be retrospectively analyzed and verified.

No trust based on position. SafeClaw evaluates actions regardless of whether the agent runs locally, in a container, in the cloud, or behind a VPN. Network position does not affect policy evaluation.

SafeClaw's policy evaluation completes in sub-millisecond time with zero network round-trips. The engine is written in TypeScript strict mode with zero third-party dependencies. The 100% open source client (MIT license) enables full inspection of the trust model. The control plane (safeclaw.onrender.com) receives only action metadata, never API keys or sensitive data, maintaining the zero-trust boundary between the client and the control plane.

SafeClaw is validated by 446 tests. Install with npx @authensor/safeclaw. Free tier with 7-day renewable keys, no credit card required.

Code Example

Zero-trust policy configuration for a multi-agent system:

# Each agent has explicit, minimal permissions.

No agent inherits trust from another.

No action is implicitly allowed.

Agent 1: Research agent — read-only data access

agent: "research-agent" rules: - name: "allow-dataset-reads" action: file_read conditions: path: starts_with: "/data/public" effect: ALLOW

- name: "allow-search-api"
action: network
conditions:
url:
starts_with: "https://search.api.internal.com"
effect: ALLOW

# Agent 2: Build agent — write to /build, run tests, no network
agent: "build-agent"
rules:
  - name: "allow-build-output"
    action: file_write
    conditions:
      path:
        starts_with: "/project/build"
    effect: ALLOW

- name: "allow-npm-test"
action: shell_exec
conditions:
command:
equals: "npm test"
effect: ALLOW

- name: "allow-npm-build"
action: shell_exec
conditions:
command:
equals: "npm run build"
effect: ALLOW

Zero-trust verification — every action is checked, no exceptions:

{
  "type": "file_read",
  "path": "/data/public/report.csv",
  "agent": "research-agent"
}
Result: ALLOW. Rule "allow-dataset-reads" matches.
{
  "type": "file_write",
  "path": "/data/public/report.csv",
  "agent": "research-agent"
}
Result: DENY. The research agent can read but not write. Read permission does not imply write permission. Each permission is independently verified.
{
  "type": "network",
  "url": "https://external-service.com/api",
  "agent": "build-agent"
}
Result: DENY. The build agent has no network permissions. Being a trusted internal agent does not grant network access. Zero trust requires explicit permission.
{
  "type": "shell_exec",
  "command": "npm test",
  "agent": "unknown-agent"
}
Result: DENY. No policy is mapped to "unknown-agent". Zero trust denies actions from unrecognized identities.

Trade-offs

When to Use

When Not to Use

Related Patterns

Cross-References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw