How to Control What Your AI Agent Can Do
To control what your AI agent can do, install SafeClaw (npx @authensor/safeclaw) and define a policy that specifies exactly which actions are allowed, denied, or require human approval. SafeClaw evaluates four action types — file_read, file_write, shell_exec, and network — against your policy rules using first-match-wins logic. Every action the agent proposes is checked before it executes, giving you precise control over your agent's capabilities.
Why This Matters
AI agents don't have a built-in concept of permissions. Unlike human users with OS-level access controls, an agent running as your user process inherits all of your permissions. It can read every file you can read, run every command you can run, and access every network endpoint you can reach. The Clawdbot incident — where 1.5 million API keys were leaked — happened because the agent had no permission boundaries at all. Controlling agent permissions requires an external enforcement layer that evaluates actions before they execute.
Step-by-Step Instructions
Step 1: Install SafeClaw
npx @authensor/safeclaw
SafeClaw is a standalone action-level gating layer. It has zero third-party dependencies, runs in TypeScript strict mode, and is backed by 446 tests. The open-source client is MIT-licensed.
Step 2: Get Your API Key
Visit safeclaw.onrender.com. Free tier with 7-day renewable key, no credit card. The dashboard includes a policy wizard that generates rules based on your agent's framework.
Step 3: Understand the Four Action Types
SafeClaw gates four categories of agent actions:
| Action Type | What It Covers | Example |
|---|---|---|
| file_read | Any file the agent tries to read | Reading ~/.ssh/id_rsa |
| file_write | Any file the agent tries to create or modify | Writing to ./output/report.md |
| shell_exec | Any shell command the agent tries to run | Running npm install |
| network | Any outbound network request | Calling api.openai.com |
Step 4: Design Your Permission Model
Start with deny-by-default. Then add explicit allow rules for each action your agent legitimately needs. Use require_approval for actions that are sometimes legitimate but should have human oversight.
Principle of least privilege: Give the agent the minimum permissions needed to do its job. If the agent only needs to read files in ./data/ and write to ./output/, don't allow filesystem-wide access.
Step 5: Write Your Policy
Create safeclaw.yaml in your project root. Rules are evaluated using first-match-wins: the first rule that matches the action determines the decision.
Step 6: Test with Simulation Mode
SAFECLAW_MODE=simulation npx @authensor/safeclaw
Run your agent through all its workflows. Review the simulation log to verify that every legitimate action is allowed and every dangerous action is blocked.
Step 7: Enforce
SAFECLAW_MODE=enforce npx @authensor/safeclaw
Policy evaluation is sub-millisecond, so your agent's performance is unaffected. Every decision is logged to a tamper-proof audit trail using SHA-256 hash chains.
Example Policy
version: "1.0"
default: deny
rules:
# ---- FILE PERMISSIONS ----
# Read: project files only
- action: file_read
path: "./src/**"
decision: allow
reason: "Read source code"
- action: file_read
path: "./data/**"
decision: allow
reason: "Read project data"
- action: file_read
path: "./package.json"
decision: allow
reason: "Read package manifest"
# Read: block sensitive paths
- action: file_read
path: "*/.env"
decision: deny
reason: "Never read credentials"
- action: file_read
path: "~/.ssh/**"
decision: deny
reason: "Never read SSH keys"
# Write: output directory only
- action: file_write
path: "./output/**"
decision: allow
reason: "Write results to output"
- action: file_write
path: "./src/**"
decision: require_approval
reason: "Source edits need human review"
# ---- SHELL PERMISSIONS ----
- action: shell_exec
command: "npm test*"
decision: allow
reason: "Run tests"
- action: shell_exec
command: "npm run build*"
decision: allow
reason: "Build project"
- action: shell_exec
command: "git log*"
decision: allow
reason: "View git history"
- action: shell_exec
command: "git diff*"
decision: allow
reason: "View diffs"
- action: shell_exec
command: "rm *"
decision: deny
reason: "No file deletion"
- action: shell_exec
command: "npm publish*"
decision: deny
reason: "No publishing"
# ---- NETWORK PERMISSIONS ----
- action: network
domain: "api.anthropic.com"
decision: allow
reason: "Claude API"
- action: network
domain: "registry.npmjs.org"
decision: allow
reason: "npm registry lookups"
- action: network
domain: "*"
decision: deny
reason: "Block all other domains"
What Happens When It Works
ALLOW — Agent reads a source file within its permitted scope:
{
"action": "file_read",
"path": "./src/utils/parser.ts",
"decision": "ALLOW",
"rule": "Read source code",
"timestamp": "2026-02-13T13:45:01Z",
"hash": "p6q7r8s9..."
}
DENY — Agent tries to publish your package to npm:
{
"action": "shell_exec",
"command": "npm publish --access public",
"decision": "DENY",
"rule": "No publishing",
"timestamp": "2026-02-13T13:45:03Z",
"hash": "t0u1v2w3..."
}
REQUIRE_APPROVAL — Agent wants to edit a source file:
{
"action": "file_write",
"path": "./src/index.ts",
"decision": "REQUIRE_APPROVAL",
"rule": "Source edits need human review",
"timestamp": "2026-02-13T13:45:05Z",
"hash": "x4y5z6a7..."
}
Common Mistakes
- Defining permissions that are too broad. A rule like
file_read: path: "**"allows the agent to read every file on the system, including credentials, SSH keys, and browser cookies. Be specific: list exactly which directories and files the agent needs access to.
- Confusing OS-level permissions with agent permissions. Your agent runs as your user. Linux file permissions and macOS sandboxing protect against other users, not against your own processes. Agent permissions must be enforced at the application layer, above the OS, which is exactly what SafeClaw does.
- Not using
require_approvalfor gray-area actions. Not every action is clearly safe or clearly dangerous. Operations like editing source code, installing packages, or making API calls to third-party services benefit from human-in-the-loop approval. Userequire_approvalto get oversight without fully blocking the agent.
Cross-References
- What Is Action-Level Gating?
- Least Privilege for Agents
- First-Match-Wins Policy Evaluation
- Human-in-the-Loop Explained
- How to Add Safety to an Existing Agent
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw