2025-11-03 · Authensor

SafeClaw Security Model

Overview

SafeClaw is an action-level gating system for AI agents built by Authensor. The security model is designed around a single principle: no AI agent action executes unless explicitly permitted by policy. This document specifies the threat model, design rationale, trust boundaries, data isolation properties, and failure behavior.

SafeClaw is 100% open source (MIT license), written in TypeScript strict mode, with zero third-party dependencies. It is installed via npx @authensor/safeclaw.

Threat Model

SafeClaw addresses threats arising from AI agents operating with tool access on user systems. The threat model considers the following adversary capabilities and attack surfaces.

In-Scope Threats

| Threat | Description | Mitigation |
|--------|-------------|------------|
| Agent overreach | Agent performs actions beyond its intended scope | Policy rules restrict action types and resources per agent |
| Prompt injection leading to tool abuse | Adversarial input causes agent to execute harmful tool calls | All tool calls pass through policy engine regardless of origin |
| Unintended destructive operations | Agent deletes files, executes rm -rf, or modifies system configs | DENY rules for destructive patterns; deny-by-default for unrecognized actions |
| Data exfiltration via network | Agent sends sensitive data to unauthorized endpoints | Network action rules restrict allowed URLs and domains |
| Credential file access | Agent reads .env, SSH keys, or token files | File read rules deny access to sensitive paths |
| Audit trail tampering | Attempt to modify or delete action history | SHA-256 hash chain makes tampering cryptographically detectable |
| Policy bypass | Agent attempts to circumvent the policy engine | Interception at the tool-use boundary; no alternate execution path |

Out-of-Scope Threats

| Threat | Rationale |
|--------|-----------|
| Compromise of the host operating system | SafeClaw operates at the application layer; OS-level security is the OS's responsibility |
| Malicious modification of the SafeClaw binary | Supply chain integrity is addressed by MIT-licensed source availability and zero dependencies |
| Side-channel attacks on the policy engine | Not applicable to the action-gating threat model |
| Denial of service against the policy engine | Local execution with no network dependency makes external DoS infeasible |

Deny-by-Default Rationale

The deny-by-default architecture is SafeClaw's most critical security property. Its rationale:

Fail-safe default — Misconfiguration (missing rules, typos, empty policy) results in all actions being denied rather than all actions being allowed. This is the safe failure mode.

Explicit permission model — Every permitted action requires a deliberate policy rule. There is no implicit ALLOW. Operators must consciously grant each capability.

Minimal attack surface — A new agent connected to SafeClaw has zero permissions by default. Permissions are added incrementally as needed, following the principle of least privilege.

Auditability — The set of permitted actions is fully enumerable by reading the policy rules. There are no hidden defaults or inherited permissions.

The deny-by-default fallback is hardcoded in the policy engine and cannot be changed to ALLOW through configuration. This is a deliberate design constraint, not a default setting.

Zero-Dependency Rationale

SafeClaw has zero third-party runtime dependencies. The rationale is security-motivated:

| Concern | How Zero Dependencies Addresses It |
|---------|-------------------------------------|
| Supply chain attacks | No node_modules means no transitive dependency risk |
| Vulnerability surface | No third-party code to audit, patch, or monitor for CVEs |
| Dependency confusion | No package names that could be targeted by typosquatting |
| Build reproducibility | Deterministic builds with no external resolution |
| Audit scope | The entire codebase is first-party and MIT-licensed |

The TypeScript strict mode compiler configuration provides type safety without runtime dependencies. All cryptographic operations (SHA-256 for the audit trail) use Node.js built-in crypto module — not a third-party library.

Local Execution Model

Policy evaluation runs locally, in-process, with zero network round-trips:

┌─────────────────────────────────────────────┐
│             User's Machine                  │
│                                             │
│  ┌──────────┐     ┌──────────────────────┐ │
│  │ AI Agent │────→│  SafeClaw Engine     │ │
│  │          │←────│  (local, in-process) │ │
│  └──────────┘     └──────────────────────┘ │
│                                             │
└─────────────────────────────────────────────┘
          │ (async, non-blocking)
          ▼
┌─────────────────────────────────────────────┐
│  Authensor Control Plane                    │
│  (safeclaw.onrender.com)                    │
│  Receives: action metadata only             │
└─────────────────────────────────────────────┘

Why Local Execution Matters

Latency — Sub-millisecond evaluation. No network latency added to agent actions.
Availability — The engine works offline. Control plane downtime does not affect policy evaluation.
Privacy — Action content (file contents, command output, network response bodies) never leaves the local machine during evaluation.
Reliability — No network failure modes (DNS, TLS, timeouts) can disrupt policy enforcement.

Control Plane Data Model

The Authensor control plane (safeclaw.onrender.com) provides key provisioning, policy synchronization, and audit trail aggregation. The data boundary between the local engine and the control plane is strictly defined.

What the Control Plane Sees

| Data | Description |
|------|-------------|
| Action type | file_write, file_read, shell_exec, or network |
| Resource identifier | File path, command string, or URL |
| Agent identity | Agent name string |
| Evaluation result | ALLOW, DENY, or REQUIRE_APPROVAL |
| Timestamp | When the action was evaluated |
| Matched rule ID | Which rule produced the decision |
| Hash chain entries | Audit trail metadata with integrity hashes |

What the Control Plane Does NOT See

| Data | Guarantee |
|------|-----------|
| File contents | Never transmitted — only the file path is sent |
| Command output | Never transmitted — only the command string is sent |
| Network response bodies | Never transmitted — only the URL is sent |
| Environment variables | Never transmitted |
| Agent conversation history | Never transmitted |
| Agent prompts or context | Never transmitted |

The control plane operates on metadata only. It knows what actions agents attempted and whether they were allowed — it does not know the content of those actions.

Key Management and Isolation

SafeClaw uses 7-day renewable API keys with the following properties:

| Property | Description |
|----------|-------------|
| Key lifetime | 7 days, renewable |
| Provisioning | Free tier, no credit card required |
| Scope | Per-installation; keys are not shared across installations |
| Storage | Local configuration file |
| Rotation | Automatic renewal before expiration |
| Revocation | Immediate via the dashboard at authensor.com |

Keys authenticate the local installation to the control plane for policy sync and audit upload. Keys are never used in the policy evaluation path — evaluation is entirely local and does not require a valid key to function.

Key Compromise Scenario

If a key is compromised, the attacker can:

Read action metadata from the control plane (they cannot read file contents, command outputs, or network responses)

Upload fabricated audit entries (detectable via hash chain verification)

The attacker cannot:

Modify policy rules (policy changes require authenticated dashboard access)

Execute actions on the user's machine (keys have no execution capability)

Access file contents, secrets, or environment variables (the control plane never has this data)

Fail-Closed Behavior

The engine is designed to fail closed — every failure mode results in DENY:

| Failure Scenario | Engine Behavior |
|-----------------|-----------------|
| Policy file missing or corrupted | DENY all actions |
| Rule evaluation throws an exception | DENY the action |
| Action request validation fails | DENY the action |
| No matching rule found | DENY (deny-by-default) |
| Engine initialization fails | All actions blocked |
| SHA-256 computation fails | DENY the action and flag audit integrity error |

There is no failure mode that results in an unintended ALLOW. This property is verified by the test suite. See the Test Coverage Reference for details.

Open Source Transparency

SafeClaw's client is 100% open source under the MIT license. This is a security property:

Auditable — Any user can inspect the policy engine code to verify its behavior
Verifiable — The deny-by-default behavior, fail-closed properties, and data boundaries are verifiable from source
No obfuscation — No minified, compiled, or binary-only components in the evaluation path
Community review — Open source enables independent security review

The control plane (safeclaw.onrender.com / authensor.com) is a hosted service operated by Authensor. The open source client communicates with the control plane over HTTPS using the documented API.

Security Testing

The SafeClaw test suite includes 446 tests across 24 files, with specific security-focused test categories:

Deny-by-default fallback verification
Fail-closed behavior for every failure scenario
Hash chain integrity and tamper detection
Policy rule evaluation correctness
Action request validation and rejection

See the Test Coverage Reference for the complete testing specification.

Related References

Policy Engine Architecture — Engine design and evaluation flow
Audit Trail Specification — Tamper-proof logging
Policy Rule Syntax Reference — How permissions are expressed
Simulation Mode Reference — Non-enforcing evaluation for policy tuning
Test Coverage Reference — Security test methodology
Deployment Reference — Installation and configuration

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw