Setting Safety Limits for Fully Autonomous AI Agents
Fully autonomous AI agents — those running without human-in-the-loop approval — require the strictest safety boundaries because there is no human checkpoint before irreversible actions execute. SafeClaw by Authensor provides layered limits for autonomous operation: action-level deny-by-default gating, execution budget caps, time-bounded policy windows, and cumulative impact thresholds that halt the agent before damage compounds. Install with npx @authensor/safeclaw to enforce these limits before deploying any agent autonomously.
The Autonomy Spectrum
Not all autonomy is equal. SafeClaw supports four operational modes along the autonomy spectrum:
FULL HUMAN APPROVE AUTO WITH FULL
CONTROL HIGH-RISK LIMITS AUTONOMOUS
────────────────────────────────────────────────────────▶
Every action Deny shell, Budget caps, Time-boxed,
requires allow file rate limits, scoped perms,
approval reads auto-halt auto-halt
Even at "full autonomous," SafeClaw never becomes allow-by-default. The agent can only execute actions explicitly permitted in its policy.
Layered Safety Limits
Layer 1: Action-Level Gating
The foundation. Every action the agent attempts — file write, shell command, network request, tool call — hits the policy engine first:
# safeclaw-autonomous.yaml
version: "1.0"
mode: autonomous
rules:
- action: file_write
path: "src/**"
decision: allow
- action: file_write
path: "**"
decision: deny
- action: shell_execute
command: "npm test"
decision: allow
- action: shell_execute
command: "npm run build"
decision: allow
- action: shell_execute
decision: deny
- action: network_request
decision: deny
- action: file_delete
decision: deny
Layer 2: Execution Budget Caps
Prevent runaway agents by capping total actions per session:
limits:
max_actions_per_session: 200
max_file_writes: 50
max_shell_executions: 20
max_bytes_written: 10485760 # 10 MB
on_limit_exceeded: halt_and_log
When any limit is reached, SafeClaw halts the agent and writes a final audit entry. This prevents scenarios where a coding agent enters an infinite fix-test-fix loop, writing thousands of files.
Layer 3: Time-Bounded Execution Windows
Autonomous agents should not run indefinitely:
time_limits:
max_session_duration: "30m"
allowed_hours: "09:00-18:00"
timezone: "UTC"
on_timeout: halt_and_notify
If the agent is still running at the 30-minute mark, SafeClaw terminates the session cleanly and logs all pending actions as denied.
Layer 4: Cumulative Impact Thresholds
Individual actions might be safe, but their cumulative effect can be dangerous. SafeClaw tracks cumulative impact:
impact_thresholds:
max_files_modified: 30
max_directories_touched: 10
max_unique_commands: 15
rollback_snapshot: true # Snapshot state before session
on_threshold_exceeded: halt_and_rollback
The Kill Switch
Every autonomous SafeClaw deployment includes a kill switch — a monitoring process that can terminate the agent immediately:
┌─────────────┐ actions ┌───────────┐
│ Autonomous │ ──────────────▶ │ SafeClaw │──▶ Execute
│ Agent │ │ Gate │
└─────────────┘ └─────┬─────┘
│
┌────▼────┐
│ Monitor │
│ Process │
│ │
│ HALT if:│
│ - limit │
│ - time │
│ - impact│
│ - manual│
└─────────┘
The monitor process runs independently of the agent, so even if the agent attempts to disable SafeClaw (which would require shell access it doesn't have), the monitor continues enforcing limits.
Simulation Before Autonomy
Before enabling autonomous mode, run the agent in simulation mode to observe what it would do:
mode: simulation # Log all decisions but execute nothing
Review the simulation audit log. If the agent stays within your intended boundaries, switch to mode: autonomous. SafeClaw's 446-test suite validates that simulation mode produces identical policy evaluations as enforcement mode, so what you see in simulation is exactly what will be enforced.
SafeClaw is MIT-licensed, works with Claude and OpenAI, and has zero runtime dependencies — critical for autonomous deployments where supply chain attacks are an amplified risk.
Cross-References
- Simulation Mode Explained
- Human-in-the-Loop Gating
- Token Budget Controls
- Deny-by-Default Pattern
- Fail-Closed Design Pattern
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw