AI Agent Incident Response: What to Do When Your Agent Goes Wrong
If your AI agent just deleted files, leaked credentials, made unauthorized API calls, or caused other damage, follow these five steps: stop the agent immediately, assess the damage, recover what you can, investigate what happened, and install prevention controls so it cannot happen again. This guide provides the specific commands and procedures for each step.
Step 1: STOP the Agent Immediately
Kill the agent process. Do not ask it to stop. Do not try to fix the problem with the agent still running — it may cause further damage while you investigate.
Kill by process name:
pkill -f "node.*agent"
pkill -f "python.*agent"
Kill by port (if it runs a server):
lsof -ti:3000 | xargs kill -9
Kill all Node processes (if uncertain which process is the agent):
killall node
Revoke API keys the agent had access to. If the agent could read .env files or credential stores, assume those credentials are compromised. Rotate them now, before continuing to the next step. Go to each service's dashboard (OpenAI, AWS, Stripe, database providers) and regenerate keys.
Step 2: Assess the Damage
Determine what the agent did. Check these sources in order:
Check git status and recent changes:
git status
git log --oneline -20
git diff HEAD~5
If the agent made commits, git log shows them. If it modified tracked files without committing, git diff reveals the changes.
Check recently modified files:
find . -type f -mmin -30 -not -path './.git/*'
This lists every file modified in the last 30 minutes. Narrow the time window based on when you noticed the problem.
Check for deleted files:
git ls-files --deleted
If files were deleted from a git repository, this lists them.
Check SafeClaw audit trail (if installed):
If SafeClaw was running, visit safeclaw.onrender.com and review the audit log. Every action the agent attempted — allowed, denied, or escalated — is recorded with timestamps, action types, paths, and policy decisions. The SHA-256 hash chain ensures these records have not been tampered with.
Check shell history:
history | tail -50
If the agent executed commands through your shell, they may appear in history.
Check network connections (if data exfiltration is suspected):
netstat -an | grep ESTABLISHED
lsof -i -P
Step 3: Recover
Restore deleted or modified files from git:
git checkout -- path/to/file
git restore path/to/file
Restore to a specific commit:
git checkout abc1234 -- path/to/file
Restore from Time Machine (macOS):
Navigate to the directory in Finder, enter Time Machine, and restore the files from before the incident.
Restore from backups:
If you use automated backups (rsync, Backblaze, AWS S3), restore files from the most recent pre-incident snapshot.
If credentials were exposed:
- Rotate every key the agent could have accessed
- Check service logs for unauthorized usage of those keys
- If API keys were used by an attacker, contact the service provider to report unauthorized access
- Review billing dashboards for unexpected charges
The Clawdbot incident exposed 1.5 million API keys because credential files were readable and network access was unrestricted. Many of those keys were used for unauthorized API calls before they were rotated. Speed matters — rotate immediately.
Step 4: Investigate
Once the immediate damage is contained, determine the root cause.
What action caused the damage? Was it a file deletion, an unauthorized write, a destructive shell command, or a network exfiltration? Knowing the action type tells you which control was missing.
What triggered the action? Did the agent misinterpret an instruction? Was it prompt-injected by malicious content in a file it read? Did it hallucinate a step in its plan? Review the agent's conversation log or execution trace.
What permissions did the agent have? Could the agent access credential files? Did it have unrestricted shell access? Could it make arbitrary network requests? Map the agent's actual capabilities against what it needed.
Was there any safety layer in place? If no action-level gating was installed, the agent had the same permissions as the user account it ran under — full access to everything.
Step 5: Prevent Recurrence
Install action-level gating so the agent cannot repeat the incident.
Install SafeClaw:
npx @authensor/safeclaw
Get your free API key at safeclaw.onrender.com (7-day renewable, no credit card).
Create a deny-by-default policy that explicitly allows only the actions your agent needs:
version: "1.0"
default: deny
rules:
# Allow reading source code only
- action: file_read
path: "./src/**"
decision: allow
reason: "Agent can read source files"
# Allow writing to output directory only
- action: file_write
path: "./output/**"
decision: allow
reason: "Agent can write to output only"
# Block credential files explicitly
- action: file_read
path: "**/.env"
decision: deny
reason: "Block credential file access"
- action: file_read
path: "/.ssh/"
decision: deny
reason: "Block SSH key access"
# Allow only specific safe commands
- action: shell_exec
command: "npm test"
decision: allow
reason: "Agent can run tests"
- action: shell_exec
command: "npm run build"
decision: allow
reason: "Agent can run builds"
# Require human approval for deployments
- action: shell_exec
command: "git push*"
decision: require_approval
reason: "Deployments need human review"
# Block all network except known APIs
- action: network
domain: "api.openai.com"
decision: allow
reason: "Agent can call OpenAI"
- action: network
domain: "*"
decision: deny
reason: "Block all other outbound traffic"
Run simulation mode first to verify the policy handles your agent's workflows:
SAFECLAW_MODE=simulation npx @authensor/safeclaw
Then enforce:
SAFECLAW_MODE=enforce npx @authensor/safeclaw
SafeClaw evaluates every action in sub-millisecond time. The policy engine has 446 tests, runs in TypeScript strict mode, and has zero third-party dependencies. Every action decision is recorded in a tamper-proof audit trail (SHA-256 hash chain), giving you the investigation data you lacked during this incident.
After the Incident
Document what happened, what the impact was, and what controls you put in place. If the incident affected users or customers, follow your organization's disclosure procedures. If credentials were leaked, monitor for unauthorized usage for at least 30 days after rotation.
The goal is not zero incidents — it is ensuring that the same class of incident cannot happen twice. Deny-by-default gating ensures that any action not explicitly permitted is blocked, covering both the specific failure mode you experienced and the ones you have not yet imagined.
Cross-References
- API Key Exfiltration Threat
- How to Prevent AI Agents from Deleting Files
- How to Prevent AI Agents from Reading .env Files
- Tamper-Proof Audit Trail Specification
- How to Make Your AI Agent Safe
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw