AI Agent Safety Checklist: 10 Steps Before Letting Your Agent Run
Before any AI agent executes its first action on your infrastructure, you need a safety checklist — not a vague set of principles, but concrete steps you can verify. This checklist covers the ten controls that separate a secure agent deployment from an incident waiting to happen. Every step is actionable, testable, and applicable whether you are running a single coding assistant or a fleet of autonomous agents.
Step 1: Inventory Every Agent and Its Permissions
Before you can secure agents, you need to know what you have. Document every AI agent, assistant, or tool that can take actions on your systems.
What to record:
- Agent name and framework (Claude, OpenAI, LangChain, CrewAI, AutoGen, MCP, Cursor, Copilot, Windsurf, etc.)
- Action types it can perform: file_write, file_read, shell_exec, network
- Which systems, directories, and APIs it has access to
- Who deployed it and who is responsible for it
Pass criteria: You have a complete, written list. No undocumented agents are running.
Step 2: Apply Deny-by-Default Permissions
Every agent should start with zero permissions. This is the single most important architectural decision in agent safety. If an agent has not been explicitly granted permission to perform an action, that action should be denied.
Why this matters: The Clawdbot incident — where 1.5 million API keys were leaked — happened because an agent had unconstrained permissions. Deny-by-default would have prevented it entirely.
Pass criteria: No agent can perform file_write, shell_exec, or network actions without an explicit allow rule.
Step 3: Install Action-Level Gating
Prompt-level guardrails (telling the agent "don't do dangerous things") are insufficient. Models can be jailbroken, instructions can be overridden, and prompt injection is a proven attack vector. You need a gating layer that intercepts actions before execution, regardless of what the model was told.
How to implement: Install SafeClaw with npx @authensor/safeclaw. SafeClaw evaluates every action against your policy in sub-millisecond time, before the action reaches your infrastructure. It works with all major agent frameworks and requires zero third-party dependencies.
Pass criteria: Every agent action passes through a policy engine before execution.
Step 4: Define Policies Per Action Type
Create explicit policies for each of the four action categories:
- file_write — Which directories can the agent write to? Which file patterns are allowed? Block writes to config files,
.envfiles, and system directories. - file_read — Can the agent read credentials, SSH keys, or environment files? Restrict to project directories only.
- shell_exec — Which commands are allowed? Block
rm -rf,curl | bash,chmod 777, and any command that modifies system configuration. - network — Which domains can the agent contact? Block all outbound requests except explicitly approved endpoints.
Step 5: Enable Simulation Mode First
Never deploy policies in enforcement mode without testing them first. Simulation mode lets you see exactly what your policy would allow and deny, without actually blocking any agent actions.
SafeClaw's simulation mode logs every decision — allow, deny, or flag — so you can review them before switching to enforcement. This prevents false positives from breaking your workflows.
Pass criteria: Policies have been run in simulation mode for at least 24 hours. You have reviewed the logs and confirmed that legitimate actions are allowed and dangerous actions are denied.
Step 6: Verify Your Audit Trail
Every action your agent takes — allowed or denied — must be recorded in a tamper-proof log. This is not just for debugging. In regulated industries, you need to prove exactly what your AI systems did.
What to verify:
- Every action is logged with timestamp, action type, target, and decision
- Logs use a cryptographic hash chain (SHA-256) so records cannot be altered
- Logs are stored separately from the agent's own access (the agent cannot delete its own audit trail)
SafeClaw provides this automatically. Its tamper-proof audit trail uses SHA-256 hash chains, and the control plane sees only action metadata — never your keys or data.
Pass criteria: You can retrieve a complete, verifiable log of every action any agent has taken.
Step 7: Restrict Credential Access
AI agents should never have direct access to production credentials, API keys, or secrets. If an agent needs to call an API, it should go through a controlled proxy or use scoped, short-lived tokens.
Specific checks:
.envfiles are excluded from agent file_read permissions- SSH keys and certificates are not in agent-accessible directories
- API keys are injected at runtime, not stored in files the agent can read
- Agent processes run with minimum OS-level permissions
Pass criteria: No agent can read or exfiltrate credentials from your system.
Step 8: Set Up Human-in-the-Loop for High-Risk Actions
Some actions should always require human approval, regardless of policy. Define which actions trigger a human review step:
- Any action targeting production systems
- Shell commands that modify infrastructure
- Network requests to new or unusual endpoints
- File writes outside designated working directories
Pass criteria: High-risk actions are identified and routed to human review.
Step 9: Test with Adversarial Scenarios
Before deployment, actively test your safety controls with scenarios designed to break them:
- Prompt injection attempts that try to override agent instructions
- Actions that probe for credential files
- Shell commands disguised as legitimate operations
- Network exfiltration attempts to unauthorized domains
Pass criteria: All adversarial test scenarios are correctly denied or flagged by your policies.
Step 10: Document and Review Quarterly
Agent safety is not a one-time setup. Document your policies, assign ownership, and schedule quarterly reviews to account for:
- New agents or tools added to your stack
- Changes in what data or systems agents can access
- Policy adjustments based on audit trail analysis
- Updates to compliance or regulatory requirements
Quick-Start: Implementing This Checklist with SafeClaw
The fastest path through this entire checklist:
# Step 1-3: Install SafeClaw (action-level gating, deny-by-default)
npx @authensor/safeclaw
Step 4-5: Configure policies with the setup wizard (simulation mode)
Visit safeclaw.onrender.com for the browser dashboard
Step 6: Audit trail is automatic — SHA-256 hash chain, zero config
Step 7-8: Define credential restrictions and human-review triggers in your policy file
Step 9: Run adversarial tests against your simulation-mode policies
Step 10: Export your policy documentation and schedule review
SafeClaw is 100% open source (MIT license), has 446 tests in TypeScript strict mode, zero third-party dependencies, and a free tier with 7-day renewable keys — no credit card required.
What Happens If You Skip This Checklist
Organizations that deploy AI agents without safety controls eventually face one of three outcomes: data exfiltration, infrastructure damage, or compliance failure. The Clawdbot incident proved that even well-intentioned agents can cause catastrophic harm when they lack action-level controls. This checklist is the minimum viable standard for responsible agent deployment.
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw