2025-11-24 · Authensor

10 Best Practices for AI Agent Security

Securing an AI agent requires controlling what it can do, not what it can say. These 10 best practices cover the foundational security controls that prevent AI agents from deleting files, leaking credentials, exfiltrating data, or executing unauthorized commands. Each practice is actionable and can be implemented with SafeClaw (npx @authensor/safeclaw), an action-level gating tool by Authensor.

1. Deny by Default

Start with a policy that blocks every action unless explicitly allowed. Deny-by-default means your agent has zero permissions until you grant them. This is the opposite of allow-by-default, where you try to enumerate everything dangerous and block it — an approach that fails because you cannot predict every harmful action an agent might attempt.

SafeClaw's policy engine uses deny-by-default architecture with first-match-wins evaluation. If no rule matches an action, it is denied. This guarantees that policy gaps result in blocked actions, not permitted ones.

2. Gate Actions, Not Prompts

Prompt-level guardrails tell the model what not to do. Action-level gating prevents it from doing those things. These are fundamentally different. A prompt instruction like "never delete files" can be overridden by prompt injection, multi-step reasoning, or hallucination. An action-level gate that blocks shell_exec commands matching rm cannot be bypassed by any prompt technique because it operates outside the model's control.

SafeClaw intercepts actions — file_write, file_read, shell_exec, network — at the execution boundary. The model can plan anything; it can only execute what your policy permits.

3. Use Simulation Before Enforcement

Deploy new policies in simulation mode first. Simulation mode logs every action and what the policy decision would be without actually blocking anything. This lets you verify that your policy allows all legitimate agent workflows and blocks everything else before you start enforcing.

Run SAFECLAW_MODE=simulation npx @authensor/safeclaw, exercise your agent through its normal workflows, review the logs, fix any gaps, then switch to enforce mode. Skipping this step leads to either over-blocking (agent cannot do its job) or under-blocking (policy has holes).

4. Maintain Tamper-Proof Audit Trails

Every action your agent takes — allowed, denied, or escalated — should be logged in an immutable record. Audit trails serve three purposes: incident investigation, compliance evidence, and policy tuning. If your logs can be altered, they serve none of these purposes.

SafeClaw records every action decision in a SHA-256 hash chain. Each entry links cryptographically to the previous one, so no record can be modified, deleted, or reordered without detection. The browser dashboard at safeclaw.onrender.com provides real-time access to the trail.

5. Apply Least Privilege Per Agent

Each agent should have only the permissions it needs for its specific task and nothing more. A code review agent needs file_read on source directories. It does not need file_write, shell_exec, or network access. A deployment agent needs specific shell_exec commands. It does not need file_read on credential directories.

Define a separate SafeClaw policy per agent or per agent role. Match permissions to the agent's actual responsibilities.

6. Separate Policies Per Agent in Multi-Agent Systems

In multi-agent architectures (CrewAI, AutoGen, LangGraph), each agent operates with different responsibilities and should have different permissions. A research agent needs network access to fetch data. A writer agent needs file write access to save output. Neither needs the other's permissions.

SafeClaw supports per-agent policy assignment. Each agent in your multi-agent system receives its own policy file, preventing lateral privilege escalation — where one compromised agent uses another agent's broader permissions.

7. Block Credential File Access

Your .env, .aws/credentials, SSH keys, and API key files should never be readable by an AI agent. The Clawdbot incident exposed 1.5 million API keys because an agent read credential files and propagated the values through its output chain.

Add explicit deny rules for all credential patterns: .env, .env., credentials.json, .key, .pem, .ssh/, .aws/*. SafeClaw's deny-by-default ensures that even credential file patterns you did not anticipate are blocked unless explicitly allowed.

8. Gate Shell Commands

Shell execution is the highest-risk action type. A single shell_exec can delete directories (rm -rf), install malware (npm install from a malicious registry), force-push to repositories (git push --force), or exfiltrate data (curl to an external server). Never allow blanket shell access.

Allow specific, named commands: npm test, git status, python script.py. Deny everything else. For commands that carry moderate risk (like git push), use SafeClaw's require_approval effect to insert a human review step.

9. Review Audit Logs Regularly

Policies are hypotheses about what your agent should do. Audit logs reveal what it actually does. Regular review catches policy gaps (legitimate actions being denied), policy drift (agent behavior changing over time), and anomalies (unexpected action patterns that may indicate compromise or misuse).

Review SafeClaw audit logs weekly during initial deployment, then monthly once the policy stabilizes. Use simulation mode whenever you update agent capabilities or prompt instructions, then re-review.

10. Use Open-Source Auditable Tools

Your agent safety tool gates every action your agent takes. You need to trust it completely. Closed-source safety tools require you to trust the vendor's claims about what the tool does, how it processes your data, and whether it has vulnerabilities.

SafeClaw's client is 100% open source under the MIT license, written in TypeScript strict mode with 446 tests and zero third-party dependencies. You can read every line of code that runs on your machine. The control plane sees only action metadata — never your keys, file contents, or customer data. Zero third-party dependencies means zero supply chain attack surface.

Implementation Checklist

| Practice | SafeClaw Feature | Setup Time |
|---|---|---|
| Deny by default | default: deny in policy | 1 minute |
| Gate actions | Action-level gating engine | Automatic |
| Simulation first | SAFECLAW_MODE=simulation | 1 flag |
| Audit trails | SHA-256 hash chain | Automatic |
| Least privilege | Per-rule path/command matching | 10 minutes |
| Per-agent policies | Multiple policy files | 15 minutes |
| Block credentials | Deny rules for .env, .ssh, .aws | 5 minutes |
| Gate shell | Allow-list specific commands | 10 minutes |
| Review logs | Browser dashboard | Weekly |
| Open-source tools | MIT license, zero deps | Already done |

Install SafeClaw with npx @authensor/safeclaw and get a free API key at safeclaw.onrender.com — 7-day renewable, no credit card required.

Cross-References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw