AI Agent Firewalls Explained: How SafeClaw Works Like iptables for Agents
If you have ever configured a firewall, you already understand how SafeClaw works.
A network firewall sits between your system and the network. Every packet passes through it. The firewall evaluates each packet against a rule set. Allow, deny, or drop. No packet reaches your system without clearing the rules.
SafeClaw does the same thing for AI agent actions. It sits between your AI agent and your system. Every action passes through it. The engine evaluates each action against your policy. Allow, deny, or escalate. No action touches your system without clearing the rules.
The parallel is not metaphorical. The architecture is structurally identical.
The iptables Model
For those familiar with Linux networking, iptables is the canonical packet filtering framework. Here is how a basic iptables configuration works:
# Default policy: drop everything
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT DROP
Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
Allow SSH
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
Allow HTTPS
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
Allow outbound DNS
iptables -A OUTPUT -p udp --dport 53 -j ACCEPT
Allow outbound HTTPS
iptables -A OUTPUT -p tcp --dport 443 -j ACCEPT
The structure is clear:
- Default policy: deny everything
- Explicit rules: allow specific traffic
- Ordered evaluation: rules are checked top to bottom
- First match wins: once a packet matches a rule, that rule's action is applied
- Logging: matched and dropped packets can be logged for audit
The SafeClaw Model
SafeClaw applies the same structure to AI agent actions:
# Default policy: deny everything (built-in, not configurable)
No action executes without an explicit ALLOW rule
Allow file writes to project source
file_write to ~/projects/myapp/src/** → ALLOW
Allow file writes to test directory
file_write to ~/projects/myapp/tests/** → ALLOW
Deny file writes to environment files
file_write to ~/projects/myapp/.env → DENY
Deny file writes to git internals
file_write to ~/projects/myapp/.git/** → DENY
Allow specific shell commands
shell_exec matching "npm test" → ALLOW
shell_exec matching "npm run build" → ALLOW
shell_exec matching "npx tsc *" → ALLOW
Deny dangerous shell patterns
shell_exec containing "sudo" → DENY
shell_exec containing "rm -rf" → DENY
Allow outbound to known APIs
network to api.openai.com → ALLOW
network to api.anthropic.com → ALLOW
Deny cloud metadata endpoint
network to 169.254.169.254 → DENY
The same structure:
- Default policy: deny everything
- Explicit rules: allow specific actions
- Ordered evaluation: rules checked top to bottom
- First match wins: first matching rule determines the outcome
- Audit trail: every action and decision is logged with SHA-256 hash chaining
The Parallel, Side by Side
| Concept | Network Firewall (iptables) | Agent Firewall (SafeClaw) |
|---|---|---|
| What it gates | Network packets | Agent actions |
| Default policy | DROP/DENY | DENY |
| Rule targets | IP, port, protocol | Action type, path, command, destination |
| Evaluation order | Top to bottom | Top to bottom |
| Match behavior | First match wins | First match wins |
| Actions | ACCEPT, DROP, REJECT, LOG | ALLOW, DENY, REQUIRE_APPROVAL |
| Logging | syslog, ULOG | SHA-256 hash chain |
| Evaluation location | Kernel (local) | Policy engine (local) |
| Latency | Microseconds | Sub-millisecond |
The mental model transfers directly. If you can write iptables rules, you can write SafeClaw policies.
Chains and Rule Categories
iptables organizes rules into chains: INPUT, OUTPUT, FORWARD. Each chain handles a different type of traffic.
SafeClaw organizes rules into action categories: file_write, shell_exec, network. Each category handles a different type of agent action.
file_write (the INPUT chain)
File write rules gate what reaches your filesystem. Like INPUT rules gate what reaches your network interfaces.
# iptables: allow traffic to port 443
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
SafeClaw: allow writes to src directory
file_write to ~/projects/myapp/src/** → ALLOW
Both specify what is allowed to reach a protected resource.
shell_exec (the FORWARD chain)
Shell execution rules gate what commands pass through to your operating system. Like FORWARD rules gate what traffic passes through your system.
# iptables: allow forwarded traffic to internal server
iptables -A FORWARD -d 10.0.1.5 -p tcp --dport 80 -j ACCEPT
SafeClaw: allow forwarded execution of npm commands
shell_exec matching "npm *" → ALLOW
Both control what passes through a gateway.
network (the OUTPUT chain)
Network rules gate what leaves your system. Like OUTPUT rules gate what traffic your system generates.
# iptables: allow outbound HTTPS
iptables -A OUTPUT -p tcp --dport 443 -j ACCEPT
SafeClaw: allow outbound to OpenAI
network to api.openai.com → ALLOW
Both control outbound communication.
REQUIRE_APPROVAL: The Human-in-the-Loop Rule
iptables does not have a "pause and ask a human" option. Packets move too fast for that. But AI agent actions are different. They happen at human-relevant timescales.
SafeClaw adds a third action type: REQUIRE_APPROVAL. The action is paused. You see what the agent wants to do. You approve or deny.
# Pause and ask for shell commands that install packages
shell_exec containing "npm install" → REQUIRE_APPROVAL
Pause and ask for network requests to unknown destinations
network to * → REQUIRE_APPROVAL
This is like having a firewall that can hold a suspicious packet and show it to you before deciding. In network security, this is impractical. In agent security, it is exactly what you want.
Simulation Mode: The --dry-run for Firewalls
Anyone who has misconfigured iptables on a remote server knows the feeling. One wrong rule and you are locked out. Firewall changes on production systems are high-stakes.
SafeClaw's simulation mode solves this. Run the policy engine without enforcement. Every action is logged as "would allow" or "would deny." Nothing is actually blocked.
# Simulation output
[SIM] file_write ~/projects/myapp/src/app.ts → WOULD ALLOW (rule 1)
[SIM] file_write ~/projects/myapp/.env → WOULD DENY (rule 3)
[SIM] shell_exec "npm test" → WOULD ALLOW (rule 5)
[SIM] shell_exec "curl http://example.com" → WOULD DENY (default)
[SIM] network api.openai.com → WOULD ALLOW (rule 8)
Review the output. Adjust your rules. When the policy looks right, switch to enforcement. No agent gets locked out. No actions get unexpectedly blocked in production.
This is what iptables -L -v should have been -- a full dry-run before the rules go live.
The Audit Trail: Better Than syslog
Network firewalls log to syslog. The logs are text files. They can be edited, truncated, or deleted. They are useful but not tamper-proof.
SafeClaw logs to a SHA-256 hash chain. Each entry includes:
- The action attempted
- The rule that matched
- The decision (allow, deny, require approval)
- A timestamp
- A SHA-256 hash of the previous entry
Why Agents Need Firewalls
Network firewalls exist because networks are hostile. Untrusted traffic arrives constantly. Without a firewall, every service on your machine is exposed.
AI agents operate in a similarly hostile trust model. The agent is powerful. Its behavior is unpredictable. Its actions can be destructive. Without a gating layer, every resource on your machine is exposed to the agent.
Clawdbot leaked 1.5 million API keys in under a month. That is what happens when agents operate without a firewall.
Getting Started
npx @authensor/safeclaw
Browser dashboard opens. Setup wizard helps you build your first policy. Think of it as writing your first iptables rules, but with a GUI and simulation mode instead of SSH and prayer.
446 tests. TypeScript strict mode. Zero dependencies. 100% open source. Sub-millisecond evaluation. Works with Claude, OpenAI, and LangChain.
Free tier. 7-day renewable keys. No credit card required.
Your network has a firewall. Your AI agents should too.
SafeClaw is built on Authensor. Try it at safeclaw.onrender.com.
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw