2025-12-26 · Authensor

How to Protect API Keys from AI Agents: A Step-by-Step Guide

Environment variables won't save you. .env files get read. .gitignore is irrelevant. Secret managers add latency but not security.

If you're running an AI coding agent — Claude, OpenAI-based tools, LangChain agents, or anything similar — your API keys are exposed unless you control what the agent can actually do. Clawdbot leaked over 1.5 million API keys in under a month. The mechanism was simple: the agent read files containing keys, then included them in output and network requests. No exploit required. Just an agent doing what agents do, without guardrails.

This guide walks through the real steps to protect your API keys from AI agents. Not the advice from 2023 that assumes humans are the only ones reading your files. The actual steps that work when the reader is an autonomous agent with shell access.

Step 0: Understand Why Traditional Advice Fails

Before we fix the problem, you need to understand why the standard playbook doesn't work here.

"Use environment variables." The agent runs in your shell environment. It can call process.env.OPENAI_API_KEY in code or printenv in the shell. Environment variables are readable by any process in the session, including the agent.

"Add .env to .gitignore." The agent reads the file system directly. It doesn't clone your repo to read files — it reads them from disk. .gitignore prevents git commits. It doesn't prevent file reads.

"Use a secret manager." If your code calls secretsManager.getSecret('openai-key') and the agent can read your code or execute it, the agent gets the secret. You've added a layer of indirection, not a layer of security.

"Don't hardcode secrets." Correct, but insufficient. The agent reads configuration files, environment variables, and credential stores. It doesn't need the key to be hardcoded in source to find it.

The common thread: all traditional advice assumes a passive threat model where secrets leak through careless humans committing files to git. AI agents are active readers with shell access. Different threat model, different solution required.

Step 1: Inventory Your Secrets

Before you can protect anything, know what you're protecting. Audit your project for:

.env and .env.* files
config.yaml, config.json, settings.py, and similar configuration files
docker-compose.yml with embedded credentials
.npmrc, .pypirc, and package manager auth tokens
SSH keys in ~/.ssh/
Cloud credential files (~/.aws/credentials, ~/.config/gcloud/)
Terraform state files (often contain plaintext secrets)
Kubernetes secrets manifests
CI/CD configuration with inline secrets

Make a list. You'll reference it when writing policy rules.

Step 2: Install SafeClaw

npx @authensor/safeclaw

SafeClaw provides action-level gating for AI agents. It intercepts agent actions — file reads, file writes, shell execution, network requests — and evaluates them against policies you define.

The install takes seconds. It ships with a browser dashboard and setup wizard, so no CLI expertise is needed. Free tier available with renewable 7-day keys, no credit card required.

SafeClaw works with Claude and OpenAI out of the box, plus LangChain. The client is 100% open source with zero third-party dependencies.

Step 3: Enable Simulation Mode

Before enforcing any policies, turn on simulation mode. This logs every action the agent attempts and what the policy decision would be, without actually blocking anything.

This is critical. If you go straight to enforcement, you'll break the agent's workflow and spend hours debugging why it can't do legitimate tasks. Simulation mode lets you see the full picture first.

Run your agent through a typical workflow while simulation mode is active. Review the logs. You'll see:

Every file the agent tried to read (including your .env)
Every shell command it tried to execute (including any printenv calls)
Every network request it tried to make (including the destinations)

This data tells you exactly what rules you need to write.

Step 4: Write Deny Rules for Sensitive Files

Based on your secrets inventory from Step 1 and the simulation data from Step 3, write rules that block access to credential files.

SafeClaw rules match on action type, path patterns, command strings, network destinations, and agent identity. Evaluation is first-match-wins, top-to-bottom.

Start with file read restrictions:

# Block all .env files
DENY file_read path=*/.env

Block credential files
DENY file_read path=*/.pem
DENY file_read path=*/.key
DENY file_read path=*/credentials
DENY file_read path=*/secrets

Block cloud config
DENY file_read path=~/.aws/**
DENY file_read path=~/.config/gcloud/**
DENY file_read path=~/.ssh/**

Block package manager auth
DENY file_read path=**/.npmrc
DENY file_read path=**/.pypirc

Then allow what the agent needs:

# Allow reading source code
ALLOW file_read path=src/**
ALLOW file_read path=lib/**
ALLOW file_read path=test/**
ALLOW file_read path=package.json
ALLOW file_read path=tsconfig.json

Step 5: Restrict Shell Execution

Shell access is the most dangerous capability an agent has. A single shell command can read any file, exfiltrate any data, or modify any system configuration.

Define an explicit allowlist:

# Allow development commands
ALLOW shell_exec command="npm test*"
ALLOW shell_exec command="npm run build*"
ALLOW shell_exec command="npm run lint*"
ALLOW shell_exec command="tsc*"
ALLOW shell_exec command="git status"
ALLOW shell_exec command="git diff*"

Block everything else (deny-by-default handles this,
but explicit deny makes the intent clear)
DENY shell_exec command="*"

Notice what's missing from the allow list: curl, wget, printenv, env, cat (which could read credential files), ssh, and every other command that could be used for reconnaissance or exfiltration.

Step 6: Control Network Destinations

Even if the agent reads a key through some path you didn't anticipate, network controls prevent exfiltration.

# Allow specific, known-good destinations
ALLOW network destination="api.github.com"
ALLOW network destination="registry.npmjs.org"
ALLOW network destination="cdn.jsdelivr.net"

Block everything else
DENY network destination="*"

This is defense in depth. Even a compromised or misbehaving agent can't send data to an unauthorized destination.

Step 7: Verify with Simulation Mode

Keep simulation mode on and run your agent through another full workflow. Check the logs:

Do legitimate actions pass? If the agent can't read package.json or run npm test, your rules are too strict. Adjust.
Do sensitive reads get blocked? Your .env file reads should show as DENY decisions. Your credential file accesses should be blocked.
Are there unexpected network destinations? If the agent tries to call a legitimate API you forgot to allowlist, add it.

Iterate until the simulation logs show clean results: all legitimate work passes, all sensitive access is blocked.

Step 8: Switch to Enforcement Mode

Once your simulation results are clean, enable enforcement. Now the policies are active. The agent's actions are gated in real time.

Policy evaluation happens locally, sub-millisecond, with no network round trips. Your agent's performance isn't impacted. SafeClaw is backed by 446 automated tests running in TypeScript strict mode.

Step 9: Monitor the Audit Trail

SafeClaw maintains a tamper-proof audit trail using a SHA-256 hash chain. Every action attempt, every policy decision, every timestamp — all recorded and verifiable.

Review this trail regularly. Look for:

Repeated DENY decisions on the same resource (might indicate the agent is trying to work around restrictions)
New file paths or network destinations you haven't seen before
Changes in the agent's behavior patterns over time

The control plane only sees action metadata, never your keys or data. Your secrets stay local.

Step 10: Maintain Your Policies

As your project evolves, your policies need to evolve with it. New dependencies might require new network destinations. New tools might require new shell commands. New directories might contain sensitive data.

Make policy review part of your development workflow. When you add a new API integration, add the network destination to your allowlist. When you add a new credential file, add it to your deny list.

What This Actually Achieves

After completing these steps, your AI agent can:

Read source code in permitted directories
Run approved development commands
Access approved network destinations
Write files to permitted paths

Your AI agent cannot:

Read .env files or credential stores
Run arbitrary shell commands
Make network requests to unauthorized destinations
Access files outside the permitted scope

This is how permissions work in every other domain of computing. File system permissions. Database access control. Cloud IAM. Network firewalls. AI agents should be no different.

SafeClaw — built on the Authensor authorization framework — makes action-level gating the default, not the exception.

Get started now. Your API keys are not going to protect themselves.

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw