2026-02-04 · Authensor

Setting Safety Limits for Fully Autonomous AI Agents

Fully autonomous AI agents — those running without human-in-the-loop approval — require the strictest safety boundaries because there is no human checkpoint before irreversible actions execute. SafeClaw by Authensor provides layered limits for autonomous operation: action-level deny-by-default gating, execution budget caps, time-bounded policy windows, and cumulative impact thresholds that halt the agent before damage compounds. Install with npx @authensor/safeclaw to enforce these limits before deploying any agent autonomously.

The Autonomy Spectrum

Not all autonomy is equal. SafeClaw supports four operational modes along the autonomy spectrum:

  FULL HUMAN       APPROVE       AUTO WITH       FULL
  CONTROL          HIGH-RISK     LIMITS          AUTONOMOUS
  ────────────────────────────────────────────────────────▶
  Every action     Deny shell,   Budget caps,    Time-boxed,
  requires         allow file    rate limits,    scoped perms,
  approval         reads         auto-halt       auto-halt

Even at "full autonomous," SafeClaw never becomes allow-by-default. The agent can only execute actions explicitly permitted in its policy.

Layered Safety Limits

Layer 1: Action-Level Gating

The foundation. Every action the agent attempts — file write, shell command, network request, tool call — hits the policy engine first:

# safeclaw-autonomous.yaml
version: "1.0"
mode: autonomous
rules:
  - action: file_write
    path: "src/**"
    decision: allow
  - action: file_write
    path: "**"
    decision: deny

- action: shell_execute
command: "npm test"
decision: allow
- action: shell_execute
command: "npm run build"
decision: allow
- action: shell_execute
decision: deny

- action: network_request
decision: deny

- action: file_delete
decision: deny

Layer 2: Execution Budget Caps

Prevent runaway agents by capping total actions per session:

limits:
  max_actions_per_session: 200
  max_file_writes: 50
  max_shell_executions: 20
  max_bytes_written: 10485760  # 10 MB
  on_limit_exceeded: halt_and_log

When any limit is reached, SafeClaw halts the agent and writes a final audit entry. This prevents scenarios where a coding agent enters an infinite fix-test-fix loop, writing thousands of files.

Layer 3: Time-Bounded Execution Windows

Autonomous agents should not run indefinitely:

time_limits:
  max_session_duration: "30m"
  allowed_hours: "09:00-18:00"
  timezone: "UTC"
  on_timeout: halt_and_notify

If the agent is still running at the 30-minute mark, SafeClaw terminates the session cleanly and logs all pending actions as denied.

Layer 4: Cumulative Impact Thresholds

Individual actions might be safe, but their cumulative effect can be dangerous. SafeClaw tracks cumulative impact:

impact_thresholds:
  max_files_modified: 30
  max_directories_touched: 10
  max_unique_commands: 15
  rollback_snapshot: true    # Snapshot state before session
  on_threshold_exceeded: halt_and_rollback

The Kill Switch

Every autonomous SafeClaw deployment includes a kill switch — a monitoring process that can terminate the agent immediately:

┌─────────────┐     actions     ┌───────────┐
│  Autonomous │ ──────────────▶ │ SafeClaw  │──▶ Execute
│  Agent      │                 │ Gate      │
└─────────────┘                 └─────┬─────┘
                                      │
                                 ┌────▼────┐
                                 │ Monitor │
                                 │ Process │
                                 │         │
                                 │ HALT if:│
                                 │ - limit │
                                 │ - time  │
                                 │ - impact│
                                 │ - manual│
                                 └─────────┘

The monitor process runs independently of the agent, so even if the agent attempts to disable SafeClaw (which would require shell access it doesn't have), the monitor continues enforcing limits.

Simulation Before Autonomy

Before enabling autonomous mode, run the agent in simulation mode to observe what it would do:

mode: simulation  # Log all decisions but execute nothing

Review the simulation audit log. If the agent stays within your intended boundaries, switch to mode: autonomous. SafeClaw's 446-test suite validates that simulation mode produces identical policy evaluations as enforcement mode, so what you see in simulation is exactly what will be enforced.

SafeClaw is MIT-licensed, works with Claude and OpenAI, and has zero runtime dependencies — critical for autonomous deployments where supply chain attacks are an amplified risk.

Cross-References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw