Security engineers evaluating AI agent tooling need controls that are auditable, verifiable, and resistant to bypass. SafeClaw by Authensor implements deny-by-default action gating with a hash-chained audit trail, zero external dependencies, and 446 tests covering every policy evaluation path. It is the security-first approach to AI agent governance: every action is denied unless an explicit policy rule permits it. Install with npx @authensor/safeclaw and inspect the source — it is MIT-licensed and fully open.
The Security Engineer's AI Agent Threat Model
AI agents introduce a new class of insider threat: an autonomous process with the developer's credentials that takes actions based on probabilistic language model outputs. The threat model includes:
- Prompt injection leading to action execution — adversarial content in codebases, issues, or documentation that causes agents to execute unintended commands
- Lateral movement in multi-agent systems — one compromised agent influencing another through shared context or tool access
- Supply chain attacks via agent-installed packages — agents running
npm install,pip install, orcargo addwith attacker-controlled package names - Data exfiltration through context windows — agents reading sensitive files and sending contents to LLM providers as part of their prompt
- Privilege escalation — agents discovering and exploiting system-level access they inherit from the host process
Security-Hardened SafeClaw Policy
# safeclaw.yaml — security engineer hardened policy
version: 1
default: deny
rules:
# Filesystem controls
- action: file_read
path: "src/**"
decision: allow
reason: "Source code is readable"
- action: file_read
path: "*/.env"
decision: deny
reason: "Environment files contain secrets"
- action: file_read
path: "*/key*"
decision: deny
reason: "Block reading key files"
- action: file_read
path: "*/credential*"
decision: deny
reason: "Block reading credential files"
- action: file_read
path: "*/secret*"
decision: deny
reason: "Block reading secret files"
- action: file_write
path: "**"
decision: prompt
reason: "All writes require human approval"
# Shell controls
- action: shell_execute
command: "sudo *"
decision: deny
reason: "No privilege escalation"
- action: shell_execute
command: "curl *"
decision: deny
reason: "Block outbound data transfer"
- action: shell_execute
command: "wget *"
decision: deny
reason: "Block outbound data transfer"
- action: shell_execute
command: "ssh *"
decision: deny
reason: "Block SSH connections"
- action: shell_execute
command: "chmod *"
decision: deny
reason: "Block permission changes"
- action: shell_execute
command: "chown *"
decision: deny
reason: "Block ownership changes"
# Network controls
- action: network_request
destination: "*"
decision: deny
reason: "All outbound network denied"
This policy implements least-privilege at every layer: filesystem reads are scoped to source code only, all writes require approval, shell commands that could exfiltrate data or escalate privileges are denied, and network access is fully blocked.
Verifying SafeClaw's Security Properties
Security engineers should verify the tool they adopt. SafeClaw's properties that matter:
Zero dependencies. The package has no external runtime dependencies. This eliminates supply chain risk from transitive dependency attacks — a critical concern when the tool itself is a security boundary.
Hash-chained audit trail. Each audit entry includes a SHA-256 hash of the previous entry. Tampering with any log entry breaks the chain, making modification detectable. This is the same integrity mechanism used in blockchain and certificate transparency logs.
446 tests. The test suite covers policy parsing, glob matching, first-match-wins evaluation, hash chain integrity, simulation mode, and edge cases. Run npm test in the SafeClaw repo to verify.
Local-only execution. No telemetry, no external API calls, no cloud dependencies. The entire system runs on the developer's machine. Data never leaves the local environment.
Provider-agnostic. SafeClaw works identically with Claude and OpenAI agents. The gating layer sits between the agent and the operating system, not between the agent and the LLM provider.
Incident Response Integration
SafeClaw's audit log can be exported and ingested into SIEM systems for correlation with other security events. When investigating an AI agent incident, the hash-chained log provides a forensic-quality timeline of every action attempted, the policy decision applied, and the rule that triggered.
Related pages:
- Zero Trust AI Agent Architecture
- Hash-Chained Audit Logs Deep Dive
- Security Model Reference
- Prompt Injection File Access Threat
- Defense in Depth for Agents
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw