Step-by-Step Migration from Uncontrolled Agents to SafeClaw
Overview
This guide covers the process of migrating from AI agents operating without action-level safety controls to a fully enforced SafeClaw deployment. The migration follows a four-phase approach: audit current agent behavior, design policies based on observed actions, validate in simulation mode, and enable enforcement. The process is designed to be non-disruptive — agents continue operating normally during the audit and simulation phases.
Most teams complete the migration in 1-3 days for a single agent and 1-2 weeks for a multi-agent environment. No code changes to the agents themselves are required.
Step-by-Step Process
Step 1: Inventory All Active Agents
Identify every AI agent operating in your environment. For each agent, document:
- Agent name and framework (Claude Code, Cursor, LangChain, CrewAI, custom)
- What tool access the agent has (file system, shell, network, database)
- What the agent's intended purpose is (coding, research, data analysis, operations)
- What systems the agent can reach (production servers, databases, APIs, cloud consoles)
Step 2: Install SafeClaw
Run npx @authensor/safeclaw to install. The setup wizard walks through initial configuration via the browser dashboard. No credit card is required — the free tier includes 7-day renewable keys. The installation adds zero third-party dependencies to your environment.
Step 3: Enable Audit-Only Mode
Before writing any policy rules, run SafeClaw in audit-only mode. In this mode, SafeClaw logs every action every agent attempts but does not block anything. This provides a baseline of actual agent behavior.
Run audit-only mode for a minimum of 24 hours, ideally covering a full work cycle. The audit trail records:
- Action type (file_read, file_write, shell_exec, network)
- Action target (file path, command, URL)
- Timestamp
- Agent identifier
Step 4: Analyze the Audit Log
Export the audit log from the browser dashboard. Categorize every observed action into three groups:
| Category | Definition | Example |
|----------|-----------|---------|
| Expected and safe | Actions within the agent's intended purpose | Reading source files, running tests |
| Expected but risky | Intended actions that carry risk if unsupervised | Database migrations, production deployments |
| Unexpected | Actions outside the agent's intended scope | Reading credential files, network calls to unknown endpoints |
This categorization directly maps to SafeClaw policy decisions: ALLOW for expected-safe, REQUIRE_APPROVAL for expected-risky, DENY for unexpected.
Step 5: Design Your Policy
Write policy rules based on the audit analysis. Start with deny-by-default (SafeClaw's default behavior) and add explicit rules:
- Write DENY rules for every action category you identified as unexpected
- Write REQUIRE_APPROVAL rules for every action category you identified as risky
- Write ALLOW rules for every action category you identified as safe
- Review the policy for completeness — any action not covered by an explicit rule will be denied by default
Step 6: Validate in Simulation Mode
Enable simulation mode. In simulation mode, SafeClaw evaluates every action against your policy and logs what the decision would be, but does not enforce it. Agents continue operating without interruption.
Run simulation mode for at least one full work cycle (24-48 hours). Review the simulation log for:
- False denials — legitimate actions that would be blocked by the policy. Add ALLOW or REQUIRE_APPROVAL rules for these.
- False approvals — risky actions that the policy would allow. Add DENY or REQUIRE_APPROVAL rules for these.
- Coverage gaps — action types or targets not addressed by any rule. Verify that deny-by-default handles these correctly.
Step 7: Enable Enforcement
Switch from simulation mode to enforcement mode. SafeClaw now evaluates every action and enforces the decision (ALLOW, DENY, or REQUIRE_APPROVAL) before the action executes.
Monitor the audit trail closely for the first 48 hours after enforcement. Watch for:
- Agent errors caused by legitimate actions being denied
- REQUIRE_APPROVAL actions that are always approved (candidates for ALLOW)
- REQUIRE_APPROVAL actions that are always denied (candidates for DENY)
Step 8: Iterate and Refine
Policy tuning is ongoing. After the first week of enforcement, review the audit trail and adjust:
- Promote frequently-approved REQUIRE_APPROVAL rules to ALLOW
- Promote frequently-denied REQUIRE_APPROVAL rules to DENY
- Add new rules for previously unseen action patterns
- Remove rules that never match (dead rules)
Checklist
- [ ] Inventory all active AI agents and their tool access
- [ ] Install SafeClaw via
npx @authensor/safeclaw - [ ] Configure initial setup through the browser dashboard wizard
- [ ] Run audit-only mode for 24+ hours
- [ ] Export and analyze the audit log
- [ ] Categorize actions into safe / risky / unexpected
- [ ] Write DENY rules for unexpected actions
- [ ] Write REQUIRE_APPROVAL rules for risky actions
- [ ] Write ALLOW rules for safe actions
- [ ] Enable simulation mode for 24-48 hours
- [ ] Review simulation results for false denials and false approvals
- [ ] Adjust policy rules based on simulation findings
- [ ] Enable enforcement mode
- [ ] Monitor audit trail for first 48 hours
- [ ] Adjust policy based on enforcement observations
- [ ] Schedule weekly policy review for ongoing tuning
Common Mistakes
1. Skipping the audit phase. Writing policies based on assumptions about agent behavior, rather than observed behavior, leads to false denials that disrupt workflows or gaps that miss real risks. Always audit first.
2. Starting with allow-by-default. Some teams try to start permissive and add restrictions later. This defeats the purpose of safety controls. SafeClaw's deny-by-default architecture exists because unknown actions are the highest risk. Start restrictive and open up selectively.
3. Writing overly broad ALLOW rules. Rules like action: file_read, target: "**" (allow all file reads) eliminate the protection file_read gating provides. Use specific directory paths and file patterns.
4. Skipping simulation mode. Going directly from no policy to enforcement causes agent failures that frustrate developers and lead to SafeClaw being disabled. Simulation mode catches policy errors before they affect productivity.
5. Not reviewing REQUIRE_APPROVAL patterns. If the same action is approved every time, it should be an ALLOW rule. If it is denied every time, it should be a DENY rule. REQUIRE_APPROVAL rules that are not genuinely decision points create approval fatigue.
Success Criteria
Migration is successful when the following conditions are met:
- Zero uncontrolled agents — every AI agent in the environment operates under a SafeClaw policy
- Policy coverage above 95% — fewer than 5% of agent actions fall through to deny-by-default (most actions match an explicit rule)
- False denial rate below 1% — fewer than 1 in 100 legitimate actions are incorrectly blocked
- Audit trail operational — tamper-proof SHA-256 hash chain logs are being generated and stored for every agent action
- Weekly review cadence established — a team member is assigned to review the audit trail and adjust policies weekly
- Simulation mode validated — every policy change goes through simulation mode before enforcement
Cross-References
- SafeClaw Setup FAQ — Installation and configuration details
- Simulation Mode Reference — How simulation mode works
- Policy Rule Syntax Reference — Full rule authoring guide
- Deny-by-Default Definition — Why deny-by-default is the correct starting point
- Audit Trail Specification — Log format and hash chain verification
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw