How to Protect API Keys from AI Agents: A Step-by-Step Guide
Environment variables won't save you. .env files get read. .gitignore is irrelevant. Secret managers add latency but not security.
If you're running an AI coding agent — Claude, OpenAI-based tools, LangChain agents, or anything similar — your API keys are exposed unless you control what the agent can actually do. Clawdbot leaked over 1.5 million API keys in under a month. The mechanism was simple: the agent read files containing keys, then included them in output and network requests. No exploit required. Just an agent doing what agents do, without guardrails.
This guide walks through the real steps to protect your API keys from AI agents. Not the advice from 2023 that assumes humans are the only ones reading your files. The actual steps that work when the reader is an autonomous agent with shell access.
Step 0: Understand Why Traditional Advice Fails
Before we fix the problem, you need to understand why the standard playbook doesn't work here.
"Use environment variables." The agent runs in your shell environment. It can call process.env.OPENAI_API_KEY in code or printenv in the shell. Environment variables are readable by any process in the session, including the agent.
"Add .env to .gitignore." The agent reads the file system directly. It doesn't clone your repo to read files — it reads them from disk. .gitignore prevents git commits. It doesn't prevent file reads.
"Use a secret manager." If your code calls secretsManager.getSecret('openai-key') and the agent can read your code or execute it, the agent gets the secret. You've added a layer of indirection, not a layer of security.
"Don't hardcode secrets." Correct, but insufficient. The agent reads configuration files, environment variables, and credential stores. It doesn't need the key to be hardcoded in source to find it.
The common thread: all traditional advice assumes a passive threat model where secrets leak through careless humans committing files to git. AI agents are active readers with shell access. Different threat model, different solution required.
Step 1: Inventory Your Secrets
Before you can protect anything, know what you're protecting. Audit your project for:
.envand.env.*filesconfig.yaml,config.json,settings.py, and similar configuration filesdocker-compose.ymlwith embedded credentials.npmrc,.pypirc, and package manager auth tokens- SSH keys in
~/.ssh/ - Cloud credential files (
~/.aws/credentials,~/.config/gcloud/) - Terraform state files (often contain plaintext secrets)
- Kubernetes secrets manifests
- CI/CD configuration with inline secrets
Step 2: Install SafeClaw
npx @authensor/safeclaw
SafeClaw provides action-level gating for AI agents. It intercepts agent actions — file reads, file writes, shell execution, network requests — and evaluates them against policies you define.
The install takes seconds. It ships with a browser dashboard and setup wizard, so no CLI expertise is needed. Free tier available with renewable 7-day keys, no credit card required.
SafeClaw works with Claude and OpenAI out of the box, plus LangChain. The client is 100% open source with zero third-party dependencies.
Step 3: Enable Simulation Mode
Before enforcing any policies, turn on simulation mode. This logs every action the agent attempts and what the policy decision would be, without actually blocking anything.
This is critical. If you go straight to enforcement, you'll break the agent's workflow and spend hours debugging why it can't do legitimate tasks. Simulation mode lets you see the full picture first.
Run your agent through a typical workflow while simulation mode is active. Review the logs. You'll see:
- Every file the agent tried to read (including your
.env) - Every shell command it tried to execute (including any
printenvcalls) - Every network request it tried to make (including the destinations)
Step 4: Write Deny Rules for Sensitive Files
Based on your secrets inventory from Step 1 and the simulation data from Step 3, write rules that block access to credential files.
SafeClaw rules match on action type, path patterns, command strings, network destinations, and agent identity. Evaluation is first-match-wins, top-to-bottom.
Start with file read restrictions:
# Block all .env files
DENY file_read path=*/.env
Block credential files
DENY file_read path=*/.pem
DENY file_read path=*/.key
DENY file_read path=*/credentials
DENY file_read path=*/secrets
Block cloud config
DENY file_read path=~/.aws/**
DENY file_read path=~/.config/gcloud/**
DENY file_read path=~/.ssh/**
Block package manager auth
DENY file_read path=**/.npmrc
DENY file_read path=**/.pypirc
Then allow what the agent needs:
# Allow reading source code
ALLOW file_read path=src/**
ALLOW file_read path=lib/**
ALLOW file_read path=test/**
ALLOW file_read path=package.json
ALLOW file_read path=tsconfig.json
Step 5: Restrict Shell Execution
Shell access is the most dangerous capability an agent has. A single shell command can read any file, exfiltrate any data, or modify any system configuration.
Define an explicit allowlist:
# Allow development commands
ALLOW shell_exec command="npm test*"
ALLOW shell_exec command="npm run build*"
ALLOW shell_exec command="npm run lint*"
ALLOW shell_exec command="tsc*"
ALLOW shell_exec command="git status"
ALLOW shell_exec command="git diff*"
Block everything else (deny-by-default handles this,
but explicit deny makes the intent clear)
DENY shell_exec command="*"
Notice what's missing from the allow list: curl, wget, printenv, env, cat (which could read credential files), ssh, and every other command that could be used for reconnaissance or exfiltration.
Step 6: Control Network Destinations
Even if the agent reads a key through some path you didn't anticipate, network controls prevent exfiltration.
# Allow specific, known-good destinations
ALLOW network destination="api.github.com"
ALLOW network destination="registry.npmjs.org"
ALLOW network destination="cdn.jsdelivr.net"
Block everything else
DENY network destination="*"
This is defense in depth. Even a compromised or misbehaving agent can't send data to an unauthorized destination.
Step 7: Verify with Simulation Mode
Keep simulation mode on and run your agent through another full workflow. Check the logs:
- Do legitimate actions pass? If the agent can't read
package.jsonor runnpm test, your rules are too strict. Adjust. - Do sensitive reads get blocked? Your
.envfile reads should show as DENY decisions. Your credential file accesses should be blocked. - Are there unexpected network destinations? If the agent tries to call a legitimate API you forgot to allowlist, add it.
Step 8: Switch to Enforcement Mode
Once your simulation results are clean, enable enforcement. Now the policies are active. The agent's actions are gated in real time.
Policy evaluation happens locally, sub-millisecond, with no network round trips. Your agent's performance isn't impacted. SafeClaw is backed by 446 automated tests running in TypeScript strict mode.
Step 9: Monitor the Audit Trail
SafeClaw maintains a tamper-proof audit trail using a SHA-256 hash chain. Every action attempt, every policy decision, every timestamp — all recorded and verifiable.
Review this trail regularly. Look for:
- Repeated DENY decisions on the same resource (might indicate the agent is trying to work around restrictions)
- New file paths or network destinations you haven't seen before
- Changes in the agent's behavior patterns over time
Step 10: Maintain Your Policies
As your project evolves, your policies need to evolve with it. New dependencies might require new network destinations. New tools might require new shell commands. New directories might contain sensitive data.
Make policy review part of your development workflow. When you add a new API integration, add the network destination to your allowlist. When you add a new credential file, add it to your deny list.
What This Actually Achieves
After completing these steps, your AI agent can:
- Read source code in permitted directories
- Run approved development commands
- Access approved network destinations
- Write files to permitted paths
- Read
.envfiles or credential stores - Run arbitrary shell commands
- Make network requests to unauthorized destinations
- Access files outside the permitted scope
SafeClaw — built on the Authensor authorization framework — makes action-level gating the default, not the exception.
Get started now. Your API keys are not going to protect themselves.
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw