API Rate Limiting for AI Agents: Preventing Runaway Costs
AI agents in loops can burn through API rate limits and budgets in minutes — a code generation agent retrying a failed build, a research agent fetching the same URL repeatedly, or a data agent making thousands of database queries. SafeClaw by Authensor enforces rate limits at the action level: every API call, shell command, and network request is counted against configurable sliding window limits, and excess calls are denied before execution. No calls leak through.
Quick Start
npx @authensor/safeclaw
Sliding Window Rate Limits
SafeClaw uses a sliding window algorithm to avoid the burst problem of fixed windows. Configure limits per action type:
version: "1.0"
description: "Rate-limited agent policy"
rules:
# LLM API rate limits
- action: llm.call
model: "claude-opus-4"
rateLimit:
maxRequests: 20
window: "1 minute"
effect: allow
reason: "Opus calls: max 20/min"
- action: llm.call
model: "gpt-4o"
rateLimit:
maxRequests: 30
window: "1 minute"
effect: allow
reason: "GPT-4o calls: max 30/min"
# Network request rate limits
- action: network.request
domain: "api.github.com"
rateLimit:
maxRequests: 60
window: "1 hour"
effect: allow
reason: "GitHub API: respect their rate limit"
- action: network.request
domain: "*.internal.company.com"
rateLimit:
maxRequests: 100
window: "1 minute"
effect: allow
reason: "Internal API: 100 req/min"
# Shell execution rate limits (catches loops)
- action: shell.execute
rateLimit:
maxRequests: 30
window: "1 minute"
effect: allow
reason: "Shell commands: max 30/min"
- action: "*"
effect: deny
reason: "Default deny"
Why Fixed Windows Fail
Fixed window rate limiting (e.g., "100 requests per minute starting at :00") allows burst attacks at window boundaries:
Window 1 (:00 - :59): Agent makes 0 requests for 59 seconds, then 100 at :59
Window 2 (:00 - :59): Agent makes 100 requests at :00
Result: 200 requests in 2 seconds across the window boundary
Sliding windows prevent this by counting requests in a rolling time period. At any given moment, the count includes all requests from the past N seconds.
Per-Action Rate Limits
Different action types carry different cost and risk profiles. Set appropriate limits for each:
rules:
# File reads: high limit (cheap, low risk)
- action: file.read
rateLimit:
maxRequests: 200
window: "1 minute"
effect: allow
# File writes: moderate limit (medium risk)
- action: file.write
rateLimit:
maxRequests: 50
window: "1 minute"
effect: allow
# LLM calls: low limit (expensive)
- action: llm.call
rateLimit:
maxRequests: 10
window: "1 minute"
effect: allow
# Network requests: moderate limit
- action: network.request
rateLimit:
maxRequests: 30
window: "1 minute"
effect: allow
# Shell execution: low limit (highest risk)
- action: shell.execute
rateLimit:
maxRequests: 10
window: "1 minute"
effect: allow
Loop Detection
Agents stuck in retry loops are the most common cause of runaway costs. SafeClaw's rate limits catch them:
14:32:01 — llm.call claude-opus-4 → ALLOW (1/20 in window)
14:32:02 — llm.call claude-opus-4 → ALLOW (2/20)
14:32:03 — llm.call claude-opus-4 → ALLOW (3/20)
...
14:32:19 — llm.call claude-opus-4 → ALLOW (19/20)
14:32:20 — llm.call claude-opus-4 → ALLOW (20/20)
14:32:21 — llm.call claude-opus-4 → DENY (rate limit exceeded)
Without rate limits, this loop could make thousands of calls before anyone notices. With SafeClaw, the agent is stopped after 20 calls in the first minute.
Tiered Rate Limits
Implement progressive throttling:
rules:
# Tier 1: Normal rate
- action: llm.call
rateLimit:
maxRequests: 20
window: "1 minute"
effect: allow
# Tier 2: Hourly cap (catches sustained high usage)
- action: llm.call
rateLimit:
maxRequests: 200
window: "1 hour"
effect: allow
# Tier 3: Daily cap (absolute maximum)
- action: llm.call
rateLimit:
maxRequests: 1000
window: "1 day"
effect: allow
An agent can burst to 20/min but cannot sustain more than 200/hour or 1000/day.
Rate Limit Monitoring
Track rate limit utilization:
# Current rate limit status
npx @authensor/safeclaw ratelimit status
Rate limit violations in the past 24 hours
npx @authensor/safeclaw audit export \
--filter reason="rate limit" \
--since "24h"
Example status output:
Rate Limit Status
──────────────────────────
llm.call (claude-opus-4): 12/20 per minute (60%)
llm.call (gpt-4o): 3/30 per minute (10%)
network.request: 45/100 per minute (45%)
shell.execute: 2/10 per minute (20%)
Combining Rate Limits with Budget Caps
Rate limits control request frequency; budget caps control cost. Use both:
rules:
- action: llm.call
rateLimit:
maxRequests: 20
window: "1 minute"
effect: allow
budgets:
- scope: global
limit: "$50.00"
period: daily
action: deny
Rate limits catch loops; budgets catch expensive individual calls (e.g., a single call with a 200k-token context window).
Why SafeClaw
- 446 tests cover sliding window computation, boundary conditions, and multi-tier limits
- Deny-by-default means agents have no API access without rate-limited allow rules
- Sub-millisecond evaluation ensures rate limit checks do not themselves become a bottleneck
- Hash-chained audit trail logs every rate-limited denial for investigation
- Works with Claude AND OpenAI — unified rate limiting across LLM providers
- MIT licensed — implement cost controls without paying for another tool
See Also
- Token Budgets for AI Agents: Controlling LLM Spend
- How to Control AI Agent Costs with Budget Policies
- AI Agent Incident Response: A Playbook for Engineering Teams
- Network Policies for AI Agents: Controlling Outbound Traffic
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw