2026-02-02 · Authensor

How to Prevent AI Agents from Deleting Your Files

To prevent AI agents from deleting your files, install SafeClaw (npx @authensor/safeclaw) and create a policy that denies all shell_exec actions containing destructive commands (rm, rmdir, unlink) and restricts file_write to specific directories. SafeClaw intercepts every action before execution, so the rm -rf / command never reaches your operating system. With deny-by-default, file deletion is blocked unless you've written an explicit rule allowing it.

Why This Matters

File deletion is the most common fear users have about AI agents — and it's well-founded. An agent tasked with "cleaning up the project" might interpret that as deleting files it considers unnecessary. A prompt injection could instruct the agent to remove configuration files, wipe directories, or overwrite critical data with empty content. Destructive file operations are irreversible on systems without snapshots, and even with backups, the downtime and data loss from an uncontrolled rm -rf can be severe.

Step-by-Step Instructions

Step 1: Install SafeClaw

npx @authensor/safeclaw

SafeClaw runs as a standalone gating layer. It has zero third-party dependencies and evaluates each policy decision in sub-millisecond time.

Step 2: Get Your API Key

Step 3: Identify Every Deletion Vector

File deletion isn't just rm. Agents can delete files through:

rm, rm -rf, rm -f (shell)

rmdir (shell)

unlink (shell or programmatic)

file_write with empty content (overwrite to blank)

mv to /dev/null or a temp directory

git clean -fd (removes untracked files)

shutil.rmtree() inside a Python script the agent writes and runs

Your policy must cover all of these vectors.

Step 4: Write Your Policy

Create safeclaw.yaml with explicit deny rules for every deletion vector, plus a default deny that catches anything you missed.

Step 5: Simulate, Then Enforce

# Test your policy without blocking
SAFECLAW_MODE=simulation npx @authensor/safeclaw

Once validated, enforce
SAFECLAW_MODE=enforce npx @authensor/safeclaw

Example Policy

version: "1.0" default: deny rules: # Block all destructive shell commands - action: shell_exec command: "rm *" decision: deny reason: "Block all rm commands" - action: shell_exec command: "rmdir *" decision: deny reason: "Block directory removal" - action: shell_exec command: "unlink *" decision: deny reason: "Block unlink" - action: shell_exec command: "shutil.rmtree" decision: deny reason: "Block Python recursive delete" - action: shell_exec command: "git clean*" decision: deny reason: "Block git clean (removes untracked files)" - action: shell_exec command: "mv * /dev/null" decision: deny reason: "Block move-to-null deletion trick" # Allow file writes only to safe directories - action: file_write path: "./output/**" decision: allow reason: "Agent can write to output directory" - action: file_write path: "./tmp/**" decision: allow reason: "Agent can write to temp directory" # Allow file reads for project data - action: file_read path: "./src/**" decision: allow reason: "Agent can read source code" - action: file_read path: "./data/**" decision: allow reason: "Agent can read data files" # Allow specific safe shell commands - action: shell_exec command: "npm test*" decision: allow reason: "Run test suite" - action: shell_exec command: "ls *" decision: allow reason: "List directory contents"

- action: shell_exec command: "cat *" decision: allow reason: "View file contents"

What Happens When It Works

DENY — Agent attempts to delete a directory:

{
  "action": "shell_exec",
  "command": "rm -rf ./old-backups",
  "decision": "DENY",
  "rule": "Block all rm commands",
  "timestamp": "2026-02-13T09:15:01Z",
  "hash": "c4d5e6f7..."
}

DENY — Agent tries to overwrite a config file via a Python script:

{
  "action": "shell_exec",
  "command": "python3 -c \"import shutil; shutil.rmtree('./config')\"",
  "decision": "DENY",
  "rule": "Block Python recursive delete",
  "timestamp": "2026-02-13T09:15:03Z",
  "hash": "g8h9i0j1..."
}

ALLOW — Agent writes a report to the permitted output directory:

{
  "action": "file_write",
  "path": "./output/analysis-report.md",
  "decision": "ALLOW",
  "rule": "Agent can write to output directory",
  "timestamp": "2026-02-13T09:15:05Z",
  "hash": "k2l3m4n5..."
}

Common Mistakes

Only blocking rm and missing other deletion vectors. Agents are creative. They can delete files using mv to discard paths, writing empty content over files, running git clean, or executing Python scripts with os.remove(). Your policy must address all of these, which is why deny-by-default is essential — anything you don't explicitly allow is blocked.

Using filesystem permissions instead of action gating. Unix file permissions protect against unauthorized users, not against processes running as your user. Your AI agent runs as you — it has every permission you have. SafeClaw operates at a layer above the filesystem, gating actions before they execute regardless of OS-level permissions.

Assuming the agent will follow instructions not to delete files. Prompt-level instructions like "never delete files" provide zero enforcement. They are suggestions to a language model, not security controls. Prompt injection, hallucination, or ambiguous task interpretation can all cause an agent to ignore prompt instructions. Enforcement must happen at the action layer.

Cross-References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw