2025-10-02 · Authensor

AI Agent Firewalls Explained: How SafeClaw Works Like iptables for Agents

If you have ever configured a firewall, you already understand how SafeClaw works.

A network firewall sits between your system and the network. Every packet passes through it. The firewall evaluates each packet against a rule set. Allow, deny, or drop. No packet reaches your system without clearing the rules.

SafeClaw does the same thing for AI agent actions. It sits between your AI agent and your system. Every action passes through it. The engine evaluates each action against your policy. Allow, deny, or escalate. No action touches your system without clearing the rules.

The parallel is not metaphorical. The architecture is structurally identical.

The iptables Model

For those familiar with Linux networking, iptables is the canonical packet filtering framework. Here is how a basic iptables configuration works:

# Default policy: drop everything
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT DROP

Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

Allow SSH
iptables -A INPUT -p tcp --dport 22 -j ACCEPT

Allow HTTPS
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

Allow outbound DNS
iptables -A OUTPUT -p udp --dport 53 -j ACCEPT

Allow outbound HTTPS
iptables -A OUTPUT -p tcp --dport 443 -j ACCEPT

The structure is clear:

Default policy: deny everything
Explicit rules: allow specific traffic
Ordered evaluation: rules are checked top to bottom
First match wins: once a packet matches a rule, that rule's action is applied
Logging: matched and dropped packets can be logged for audit

This is a proven model. It has protected networks for decades.

The SafeClaw Model

SafeClaw applies the same structure to AI agent actions:

# Default policy: deny everything (built-in, not configurable)
No action executes without an explicit ALLOW rule

Allow file writes to project source
file_write to ~/projects/myapp/src/** → ALLOW

Allow file writes to test directory
file_write to ~/projects/myapp/tests/** → ALLOW

Deny file writes to environment files
file_write to ~/projects/myapp/.env → DENY

Deny file writes to git internals
file_write to ~/projects/myapp/.git/** → DENY

Allow specific shell commands
shell_exec matching "npm test" → ALLOW
shell_exec matching "npm run build" → ALLOW
shell_exec matching "npx tsc *" → ALLOW

Deny dangerous shell patterns
shell_exec containing "sudo" → DENY
shell_exec containing "rm -rf" → DENY

Allow outbound to known APIs
network to api.openai.com → ALLOW
network to api.anthropic.com → ALLOW

Deny cloud metadata endpoint
network to 169.254.169.254 → DENY

The same structure:

Default policy: deny everything
Explicit rules: allow specific actions
Ordered evaluation: rules checked top to bottom
First match wins: first matching rule determines the outcome
Audit trail: every action and decision is logged with SHA-256 hash chaining

The Parallel, Side by Side

| Concept | Network Firewall (iptables) | Agent Firewall (SafeClaw) |
|---|---|---|
| What it gates | Network packets | Agent actions |
| Default policy | DROP/DENY | DENY |
| Rule targets | IP, port, protocol | Action type, path, command, destination |
| Evaluation order | Top to bottom | Top to bottom |
| Match behavior | First match wins | First match wins |
| Actions | ACCEPT, DROP, REJECT, LOG | ALLOW, DENY, REQUIRE_APPROVAL |
| Logging | syslog, ULOG | SHA-256 hash chain |
| Evaluation location | Kernel (local) | Policy engine (local) |
| Latency | Microseconds | Sub-millisecond |

The mental model transfers directly. If you can write iptables rules, you can write SafeClaw policies.

Chains and Rule Categories

iptables organizes rules into chains: INPUT, OUTPUT, FORWARD. Each chain handles a different type of traffic.

SafeClaw organizes rules into action categories: file_write, shell_exec, network. Each category handles a different type of agent action.

file_write (the INPUT chain)

File write rules gate what reaches your filesystem. Like INPUT rules gate what reaches your network interfaces.

# iptables: allow traffic to port 443
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

SafeClaw: allow writes to src directory
file_write to ~/projects/myapp/src/** → ALLOW

Both specify what is allowed to reach a protected resource.

shell_exec (the FORWARD chain)

Shell execution rules gate what commands pass through to your operating system. Like FORWARD rules gate what traffic passes through your system.

# iptables: allow forwarded traffic to internal server
iptables -A FORWARD -d 10.0.1.5 -p tcp --dport 80 -j ACCEPT

SafeClaw: allow forwarded execution of npm commands
shell_exec matching "npm *" → ALLOW

Both control what passes through a gateway.

network (the OUTPUT chain)

Network rules gate what leaves your system. Like OUTPUT rules gate what traffic your system generates.

# iptables: allow outbound HTTPS
iptables -A OUTPUT -p tcp --dport 443 -j ACCEPT

SafeClaw: allow outbound to OpenAI
network to api.openai.com → ALLOW

Both control outbound communication.

REQUIRE_APPROVAL: The Human-in-the-Loop Rule

iptables does not have a "pause and ask a human" option. Packets move too fast for that. But AI agent actions are different. They happen at human-relevant timescales.

SafeClaw adds a third action type: REQUIRE_APPROVAL. The action is paused. You see what the agent wants to do. You approve or deny.

# Pause and ask for shell commands that install packages
shell_exec containing "npm install" → REQUIRE_APPROVAL

Pause and ask for network requests to unknown destinations
network to * → REQUIRE_APPROVAL

This is like having a firewall that can hold a suspicious packet and show it to you before deciding. In network security, this is impractical. In agent security, it is exactly what you want.

Simulation Mode: The --dry-run for Firewalls

Anyone who has misconfigured iptables on a remote server knows the feeling. One wrong rule and you are locked out. Firewall changes on production systems are high-stakes.

SafeClaw's simulation mode solves this. Run the policy engine without enforcement. Every action is logged as "would allow" or "would deny." Nothing is actually blocked.

# Simulation output
[SIM] file_write ~/projects/myapp/src/app.ts → WOULD ALLOW (rule 1)
[SIM] file_write ~/projects/myapp/.env → WOULD DENY (rule 3)
[SIM] shell_exec "npm test" → WOULD ALLOW (rule 5)
[SIM] shell_exec "curl http://example.com" → WOULD DENY (default)
[SIM] network api.openai.com → WOULD ALLOW (rule 8)

Review the output. Adjust your rules. When the policy looks right, switch to enforcement. No agent gets locked out. No actions get unexpectedly blocked in production.

This is what iptables -L -v should have been -- a full dry-run before the rules go live.

The Audit Trail: Better Than syslog

Network firewalls log to syslog. The logs are text files. They can be edited, truncated, or deleted. They are useful but not tamper-proof.

SafeClaw logs to a SHA-256 hash chain. Each entry includes:

The action attempted
The rule that matched
The decision (allow, deny, require approval)
A timestamp
A SHA-256 hash of the previous entry

Alter any entry and the hash chain breaks. The entire history is cryptographically verifiable. This is a stronger audit guarantee than any network firewall provides.

Why Agents Need Firewalls

Network firewalls exist because networks are hostile. Untrusted traffic arrives constantly. Without a firewall, every service on your machine is exposed.

AI agents operate in a similarly hostile trust model. The agent is powerful. Its behavior is unpredictable. Its actions can be destructive. Without a gating layer, every resource on your machine is exposed to the agent.

Clawdbot leaked 1.5 million API keys in under a month. That is what happens when agents operate without a firewall.

Getting Started

npx @authensor/safeclaw

Browser dashboard opens. Setup wizard helps you build your first policy. Think of it as writing your first iptables rules, but with a GUI and simulation mode instead of SSH and prayer.

446 tests. TypeScript strict mode. Zero dependencies. 100% open source. Sub-millisecond evaluation. Works with Claude, OpenAI, and LangChain.

Free tier. 7-day renewable keys. No credit card required.

Your network has a firewall. Your AI agents should too.

SafeClaw is built on Authensor. Try it at safeclaw.onrender.com.

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw