2026-01-19 · Authensor

AI Agent Safety Checklist: 10 Steps Before Letting Your Agent Run

Before any AI agent executes its first action on your infrastructure, you need a safety checklist — not a vague set of principles, but concrete steps you can verify. This checklist covers the ten controls that separate a secure agent deployment from an incident waiting to happen. Every step is actionable, testable, and applicable whether you are running a single coding assistant or a fleet of autonomous agents.

Step 1: Inventory Every Agent and Its Permissions

Before you can secure agents, you need to know what you have. Document every AI agent, assistant, or tool that can take actions on your systems.

What to record:

Agent name and framework (Claude, OpenAI, LangChain, CrewAI, AutoGen, MCP, Cursor, Copilot, Windsurf, etc.)

Action types it can perform: file_write, file_read, shell_exec, network

Which systems, directories, and APIs it has access to

Who deployed it and who is responsible for it

Pass criteria: You have a complete, written list. No undocumented agents are running.

Step 2: Apply Deny-by-Default Permissions

Every agent should start with zero permissions. This is the single most important architectural decision in agent safety. If an agent has not been explicitly granted permission to perform an action, that action should be denied.

Why this matters: The Clawdbot incident — where 1.5 million API keys were leaked — happened because an agent had unconstrained permissions. Deny-by-default would have prevented it entirely.

Pass criteria: No agent can perform file_write, shell_exec, or network actions without an explicit allow rule.

Step 3: Install Action-Level Gating

Prompt-level guardrails (telling the agent "don't do dangerous things") are insufficient. Models can be jailbroken, instructions can be overridden, and prompt injection is a proven attack vector. You need a gating layer that intercepts actions before execution, regardless of what the model was told.

How to implement: Install SafeClaw with npx @authensor/safeclaw. SafeClaw evaluates every action against your policy in sub-millisecond time, before the action reaches your infrastructure. It works with all major agent frameworks and requires zero third-party dependencies.

Pass criteria: Every agent action passes through a policy engine before execution.

Step 4: Define Policies Per Action Type

Create explicit policies for each of the four action categories:

file_write — Which directories can the agent write to? Which file patterns are allowed? Block writes to config files, .env files, and system directories.
file_read — Can the agent read credentials, SSH keys, or environment files? Restrict to project directories only.
shell_exec — Which commands are allowed? Block rm -rf, curl | bash, chmod 777, and any command that modifies system configuration.
network — Which domains can the agent contact? Block all outbound requests except explicitly approved endpoints.

Pass criteria: You have written rules for all four action types. No action type is ungoverned.

Step 5: Enable Simulation Mode First

Never deploy policies in enforcement mode without testing them first. Simulation mode lets you see exactly what your policy would allow and deny, without actually blocking any agent actions.

SafeClaw's simulation mode logs every decision — allow, deny, or flag — so you can review them before switching to enforcement. This prevents false positives from breaking your workflows.

Pass criteria: Policies have been run in simulation mode for at least 24 hours. You have reviewed the logs and confirmed that legitimate actions are allowed and dangerous actions are denied.

Step 6: Verify Your Audit Trail

Every action your agent takes — allowed or denied — must be recorded in a tamper-proof log. This is not just for debugging. In regulated industries, you need to prove exactly what your AI systems did.

What to verify:

Every action is logged with timestamp, action type, target, and decision

Logs use a cryptographic hash chain (SHA-256) so records cannot be altered

Logs are stored separately from the agent's own access (the agent cannot delete its own audit trail)

SafeClaw provides this automatically. Its tamper-proof audit trail uses SHA-256 hash chains, and the control plane sees only action metadata — never your keys or data.

Pass criteria: You can retrieve a complete, verifiable log of every action any agent has taken.

Step 7: Restrict Credential Access

AI agents should never have direct access to production credentials, API keys, or secrets. If an agent needs to call an API, it should go through a controlled proxy or use scoped, short-lived tokens.

Specific checks:

.env files are excluded from agent file_read permissions

SSH keys and certificates are not in agent-accessible directories

API keys are injected at runtime, not stored in files the agent can read

Agent processes run with minimum OS-level permissions

Pass criteria: No agent can read or exfiltrate credentials from your system.

Step 8: Set Up Human-in-the-Loop for High-Risk Actions

Some actions should always require human approval, regardless of policy. Define which actions trigger a human review step:

Any action targeting production systems
Shell commands that modify infrastructure
Network requests to new or unusual endpoints
File writes outside designated working directories

SafeClaw supports flagging actions for review rather than auto-allowing or auto-denying them, giving you a middle ground between full automation and full manual control.

Pass criteria: High-risk actions are identified and routed to human review.

Step 9: Test with Adversarial Scenarios

Before deployment, actively test your safety controls with scenarios designed to break them:

Prompt injection attempts that try to override agent instructions
Actions that probe for credential files
Shell commands disguised as legitimate operations
Network exfiltration attempts to unauthorized domains

Run these tests with simulation mode enabled so you can see exactly how your policies respond.

Pass criteria: All adversarial test scenarios are correctly denied or flagged by your policies.

Step 10: Document and Review Quarterly

Agent safety is not a one-time setup. Document your policies, assign ownership, and schedule quarterly reviews to account for:

New agents or tools added to your stack
Changes in what data or systems agents can access
Policy adjustments based on audit trail analysis
Updates to compliance or regulatory requirements

Pass criteria: Safety documentation exists, has an owner, and has a scheduled review date.

Quick-Start: Implementing This Checklist with SafeClaw

The fastest path through this entire checklist:

# Step 1-3: Install SafeClaw (action-level gating, deny-by-default)
npx @authensor/safeclaw

Step 4-5: Configure policies with the setup wizard (simulation mode)
Visit safeclaw.onrender.com for the browser dashboard

Step 6: Audit trail is automatic — SHA-256 hash chain, zero config

Step 7-8: Define credential restrictions and human-review triggers in your policy file

Step 9: Run adversarial tests against your simulation-mode policies

Step 10: Export your policy documentation and schedule review

SafeClaw is 100% open source (MIT license), has 446 tests in TypeScript strict mode, zero third-party dependencies, and a free tier with 7-day renewable keys — no credit card required.

What Happens If You Skip This Checklist

Organizations that deploy AI agents without safety controls eventually face one of three outcomes: data exfiltration, infrastructure damage, or compliance failure. The Clawdbot incident proved that even well-intentioned agents can cause catastrophic harm when they lack action-level controls. This checklist is the minimum viable standard for responsible agent deployment.

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw