2026-01-26 · Authensor

How to Make Your AI Agent Safe

To make your AI agent safe, install SafeClaw (npx @authensor/safeclaw), define a deny-by-default policy, and run in simulation mode before enforcing. SafeClaw evaluates every action — file writes, shell commands, network requests — before the agent executes it. This is action-level gating: the agent proposes an action, SafeClaw checks it against your policy, and the action only runs if explicitly allowed.

Why This Matters

AI agents operate with real system access. They write files, execute shell commands, make network requests, and read sensitive directories. In 2025, the Clawdbot incident exposed 1.5 million API keys because the agent had unrestricted access to credential files and outbound network calls. Without action-level gating, your agent is one prompt injection away from exfiltrating your .env file or running rm -rf /.

Step-by-Step Instructions

Step 1: Install SafeClaw

npx @authensor/safeclaw

This runs the setup wizard. No dependencies are installed into your project — SafeClaw has zero third-party dependencies and runs as a standalone process. The client is 100% open source under the MIT license.

Step 2: Get Your API Key

Visit safeclaw.onrender.com to create your account. The free tier provides a 7-day renewable key with no credit card required. The browser dashboard includes a setup wizard that generates your initial policy.

Step 3: Define Your Policy

Create a safeclaw.yaml file in your project root. Start with deny-by-default — every action is blocked unless you explicitly allow it.

Step 4: Run in Simulation Mode

SAFECLAW_MODE=simulation npx @authensor/safeclaw

Simulation mode logs what would be blocked or allowed without actually preventing any actions. Run your agent through its normal workflows and review the logs. This shows you exactly which actions your agent takes and whether your policy handles them correctly.

Step 5: Switch to Enforce Mode

Once your simulation logs show correct allow/deny decisions, switch to enforce mode:

SAFECLAW_MODE=enforce npx @authensor/safeclaw

Now SafeClaw actively blocks any action not permitted by your policy. Policy evaluation is sub-millisecond, so your agent won't slow down.

Step 6: Review Your Audit Trail

Every action — allowed, denied, or escalated — is recorded in a tamper-proof audit trail using a SHA-256 hash chain. Each entry links cryptographically to the previous one, so no record can be altered or deleted without detection.

Example Policy

version: "1.0" default: deny rules: - action: file_write path: "./output/**" decision: allow reason: "Agent can write to output directory" - action: file_read path: "./data/**" decision: allow reason: "Agent can read project data" - action: file_read path: "~/.ssh/**" decision: deny reason: "Never read SSH keys" - action: file_read path: "**/.env" decision: deny reason: "Never read environment files" - action: shell_exec command: "npm test" decision: allow reason: "Agent can run tests" - action: shell_exec command: "rm *" decision: deny reason: "Never allow destructive commands" - action: network domain: "api.openai.com" decision: allow reason: "Agent can call OpenAI API"

- action: network domain: "*" decision: deny reason: "Block all other outbound traffic"

What Happens When It Works

ALLOW — Agent writes a file to the permitted output directory:

{
  "action": "file_write",
  "path": "./output/report.json",
  "decision": "ALLOW",
  "rule": "Agent can write to output directory",
  "timestamp": "2026-02-13T10:00:01Z",
  "hash": "a1b2c3d4..."
}

DENY — Agent attempts to read your .env file:

{
  "action": "file_read",
  "path": "/Users/dev/project/.env",
  "decision": "DENY",
  "rule": "Never read environment files",
  "timestamp": "2026-02-13T10:00:02Z",
  "hash": "e5f6g7h8..."
}

REQUIRE_APPROVAL — Agent wants to install an npm package:

{
  "action": "shell_exec",
  "command": "npm install lodash",
  "decision": "REQUIRE_APPROVAL",
  "rule": "Package installs require human review",
  "timestamp": "2026-02-13T10:00:03Z",
  "hash": "i9j0k1l2..."
}

Common Mistakes

Using allow-by-default policies. Many teams start by blocking a few known-bad actions and allowing everything else. This fails because you cannot enumerate every dangerous action an agent might take. Always start with deny-by-default and explicitly allow only what the agent needs.

Relying on prompt instructions for safety. Telling an agent "don't delete files" in its system prompt provides zero enforcement. Prompt instructions can be overridden by prompt injection, user manipulation, or model hallucination. Safety must be enforced at the action layer, not the prompt layer.

Skipping simulation mode. Deploying an enforce-mode policy without first running simulation mode leads to either over-blocking (agent can't do its job) or under-blocking (policy has gaps you didn't anticipate). Always simulate first, review the logs, then enforce.

Cross-References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw