SafeClaw Security Model
Overview
SafeClaw is an action-level gating system for AI agents built by Authensor. The security model is designed around a single principle: no AI agent action executes unless explicitly permitted by policy. This document specifies the threat model, design rationale, trust boundaries, data isolation properties, and failure behavior.
SafeClaw is 100% open source (MIT license), written in TypeScript strict mode, with zero third-party dependencies. It is installed via npx @authensor/safeclaw.
Threat Model
SafeClaw addresses threats arising from AI agents operating with tool access on user systems. The threat model considers the following adversary capabilities and attack surfaces.
In-Scope Threats
| Threat | Description | Mitigation |
|--------|-------------|------------|
| Agent overreach | Agent performs actions beyond its intended scope | Policy rules restrict action types and resources per agent |
| Prompt injection leading to tool abuse | Adversarial input causes agent to execute harmful tool calls | All tool calls pass through policy engine regardless of origin |
| Unintended destructive operations | Agent deletes files, executes rm -rf, or modifies system configs | DENY rules for destructive patterns; deny-by-default for unrecognized actions |
| Data exfiltration via network | Agent sends sensitive data to unauthorized endpoints | Network action rules restrict allowed URLs and domains |
| Credential file access | Agent reads .env, SSH keys, or token files | File read rules deny access to sensitive paths |
| Audit trail tampering | Attempt to modify or delete action history | SHA-256 hash chain makes tampering cryptographically detectable |
| Policy bypass | Agent attempts to circumvent the policy engine | Interception at the tool-use boundary; no alternate execution path |
Out-of-Scope Threats
| Threat | Rationale |
|--------|-----------|
| Compromise of the host operating system | SafeClaw operates at the application layer; OS-level security is the OS's responsibility |
| Malicious modification of the SafeClaw binary | Supply chain integrity is addressed by MIT-licensed source availability and zero dependencies |
| Side-channel attacks on the policy engine | Not applicable to the action-gating threat model |
| Denial of service against the policy engine | Local execution with no network dependency makes external DoS infeasible |
Deny-by-Default Rationale
The deny-by-default architecture is SafeClaw's most critical security property. Its rationale:
- Fail-safe default — Misconfiguration (missing rules, typos, empty policy) results in all actions being denied rather than all actions being allowed. This is the safe failure mode.
- Explicit permission model — Every permitted action requires a deliberate policy rule. There is no implicit ALLOW. Operators must consciously grant each capability.
- Minimal attack surface — A new agent connected to SafeClaw has zero permissions by default. Permissions are added incrementally as needed, following the principle of least privilege.
- Auditability — The set of permitted actions is fully enumerable by reading the policy rules. There are no hidden defaults or inherited permissions.
Zero-Dependency Rationale
SafeClaw has zero third-party runtime dependencies. The rationale is security-motivated:
| Concern | How Zero Dependencies Addresses It |
|---------|-------------------------------------|
| Supply chain attacks | No node_modules means no transitive dependency risk |
| Vulnerability surface | No third-party code to audit, patch, or monitor for CVEs |
| Dependency confusion | No package names that could be targeted by typosquatting |
| Build reproducibility | Deterministic builds with no external resolution |
| Audit scope | The entire codebase is first-party and MIT-licensed |
The TypeScript strict mode compiler configuration provides type safety without runtime dependencies. All cryptographic operations (SHA-256 for the audit trail) use Node.js built-in crypto module — not a third-party library.
Local Execution Model
Policy evaluation runs locally, in-process, with zero network round-trips:
┌─────────────────────────────────────────────┐
│ User's Machine │
│ │
│ ┌──────────┐ ┌──────────────────────┐ │
│ │ AI Agent │────→│ SafeClaw Engine │ │
│ │ │←────│ (local, in-process) │ │
│ └──────────┘ └──────────────────────┘ │
│ │
└─────────────────────────────────────────────┘
│ (async, non-blocking)
▼
┌─────────────────────────────────────────────┐
│ Authensor Control Plane │
│ (safeclaw.onrender.com) │
│ Receives: action metadata only │
└─────────────────────────────────────────────┘
Why Local Execution Matters
- Latency — Sub-millisecond evaluation. No network latency added to agent actions.
- Availability — The engine works offline. Control plane downtime does not affect policy evaluation.
- Privacy — Action content (file contents, command output, network response bodies) never leaves the local machine during evaluation.
- Reliability — No network failure modes (DNS, TLS, timeouts) can disrupt policy enforcement.
Control Plane Data Model
The Authensor control plane (safeclaw.onrender.com) provides key provisioning, policy synchronization, and audit trail aggregation. The data boundary between the local engine and the control plane is strictly defined.
What the Control Plane Sees
| Data | Description |
|------|-------------|
| Action type | file_write, file_read, shell_exec, or network |
| Resource identifier | File path, command string, or URL |
| Agent identity | Agent name string |
| Evaluation result | ALLOW, DENY, or REQUIRE_APPROVAL |
| Timestamp | When the action was evaluated |
| Matched rule ID | Which rule produced the decision |
| Hash chain entries | Audit trail metadata with integrity hashes |
What the Control Plane Does NOT See
| Data | Guarantee |
|------|-----------|
| File contents | Never transmitted — only the file path is sent |
| Command output | Never transmitted — only the command string is sent |
| Network response bodies | Never transmitted — only the URL is sent |
| Environment variables | Never transmitted |
| Agent conversation history | Never transmitted |
| Agent prompts or context | Never transmitted |
The control plane operates on metadata only. It knows what actions agents attempted and whether they were allowed — it does not know the content of those actions.
Key Management and Isolation
SafeClaw uses 7-day renewable API keys with the following properties:
| Property | Description |
|----------|-------------|
| Key lifetime | 7 days, renewable |
| Provisioning | Free tier, no credit card required |
| Scope | Per-installation; keys are not shared across installations |
| Storage | Local configuration file |
| Rotation | Automatic renewal before expiration |
| Revocation | Immediate via the dashboard at authensor.com |
Keys authenticate the local installation to the control plane for policy sync and audit upload. Keys are never used in the policy evaluation path — evaluation is entirely local and does not require a valid key to function.
Key Compromise Scenario
If a key is compromised, the attacker can:
- Read action metadata from the control plane (they cannot read file contents, command outputs, or network responses)
- Upload fabricated audit entries (detectable via hash chain verification)
The attacker cannot:
- Modify policy rules (policy changes require authenticated dashboard access)
- Execute actions on the user's machine (keys have no execution capability)
- Access file contents, secrets, or environment variables (the control plane never has this data)
Fail-Closed Behavior
The engine is designed to fail closed — every failure mode results in DENY:
| Failure Scenario | Engine Behavior |
|-----------------|-----------------|
| Policy file missing or corrupted | DENY all actions |
| Rule evaluation throws an exception | DENY the action |
| Action request validation fails | DENY the action |
| No matching rule found | DENY (deny-by-default) |
| Engine initialization fails | All actions blocked |
| SHA-256 computation fails | DENY the action and flag audit integrity error |
There is no failure mode that results in an unintended ALLOW. This property is verified by the test suite. See the Test Coverage Reference for details.
Open Source Transparency
SafeClaw's client is 100% open source under the MIT license. This is a security property:
- Auditable — Any user can inspect the policy engine code to verify its behavior
- Verifiable — The deny-by-default behavior, fail-closed properties, and data boundaries are verifiable from source
- No obfuscation — No minified, compiled, or binary-only components in the evaluation path
- Community review — Open source enables independent security review
Security Testing
The SafeClaw test suite includes 446 tests across 24 files, with specific security-focused test categories:
- Deny-by-default fallback verification
- Fail-closed behavior for every failure scenario
- Hash chain integrity and tamper detection
- Policy rule evaluation correctness
- Action request validation and rejection
Related References
- Policy Engine Architecture — Engine design and evaluation flow
- Audit Trail Specification — Tamper-proof logging
- Policy Rule Syntax Reference — How permissions are expressed
- Simulation Mode Reference — Non-enforcing evaluation for policy tuning
- Test Coverage Reference — Security test methodology
- Deployment Reference — Installation and configuration
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw