2026-01-23 · Authensor

API Rate Limiting for AI Agents: Preventing Runaway Costs

AI agents in loops can burn through API rate limits and budgets in minutes — a code generation agent retrying a failed build, a research agent fetching the same URL repeatedly, or a data agent making thousands of database queries. SafeClaw by Authensor enforces rate limits at the action level: every API call, shell command, and network request is counted against configurable sliding window limits, and excess calls are denied before execution. No calls leak through.

Quick Start

npx @authensor/safeclaw

Sliding Window Rate Limits

SafeClaw uses a sliding window algorithm to avoid the burst problem of fixed windows. Configure limits per action type:

version: "1.0" description: "Rate-limited agent policy" rules: # LLM API rate limits - action: llm.call model: "claude-opus-4" rateLimit: maxRequests: 20 window: "1 minute" effect: allow reason: "Opus calls: max 20/min" - action: llm.call model: "gpt-4o" rateLimit: maxRequests: 30 window: "1 minute" effect: allow reason: "GPT-4o calls: max 30/min" # Network request rate limits - action: network.request domain: "api.github.com" rateLimit: maxRequests: 60 window: "1 hour" effect: allow reason: "GitHub API: respect their rate limit" - action: network.request domain: "*.internal.company.com" rateLimit: maxRequests: 100 window: "1 minute" effect: allow reason: "Internal API: 100 req/min" # Shell execution rate limits (catches loops) - action: shell.execute rateLimit: maxRequests: 30 window: "1 minute" effect: allow reason: "Shell commands: max 30/min"

- action: "*" effect: deny reason: "Default deny"

Why Fixed Windows Fail

Fixed window rate limiting (e.g., "100 requests per minute starting at :00") allows burst attacks at window boundaries:

Window 1 (:00 - :59): Agent makes 0 requests for 59 seconds, then 100 at :59
Window 2 (:00 - :59): Agent makes 100 requests at :00
Result: 200 requests in 2 seconds across the window boundary

Sliding windows prevent this by counting requests in a rolling time period. At any given moment, the count includes all requests from the past N seconds.

Per-Action Rate Limits

Different action types carry different cost and risk profiles. Set appropriate limits for each:

rules: # File reads: high limit (cheap, low risk) - action: file.read rateLimit: maxRequests: 200 window: "1 minute" effect: allow # File writes: moderate limit (medium risk) - action: file.write rateLimit: maxRequests: 50 window: "1 minute" effect: allow # LLM calls: low limit (expensive) - action: llm.call rateLimit: maxRequests: 10 window: "1 minute" effect: allow # Network requests: moderate limit - action: network.request rateLimit: maxRequests: 30 window: "1 minute" effect: allow

# Shell execution: low limit (highest risk) - action: shell.execute rateLimit: maxRequests: 10 window: "1 minute" effect: allow

Loop Detection

Agents stuck in retry loops are the most common cause of runaway costs. SafeClaw's rate limits catch them:

14:32:01 — llm.call claude-opus-4 → ALLOW (1/20 in window)
14:32:02 — llm.call claude-opus-4 → ALLOW (2/20)
14:32:03 — llm.call claude-opus-4 → ALLOW (3/20)
...
14:32:19 — llm.call claude-opus-4 → ALLOW (19/20)
14:32:20 — llm.call claude-opus-4 → ALLOW (20/20)
14:32:21 — llm.call claude-opus-4 → DENY (rate limit exceeded)

Without rate limits, this loop could make thousands of calls before anyone notices. With SafeClaw, the agent is stopped after 20 calls in the first minute.

Tiered Rate Limits

Implement progressive throttling:

rules: # Tier 1: Normal rate - action: llm.call rateLimit: maxRequests: 20 window: "1 minute" effect: allow # Tier 2: Hourly cap (catches sustained high usage) - action: llm.call rateLimit: maxRequests: 200 window: "1 hour" effect: allow

# Tier 3: Daily cap (absolute maximum) - action: llm.call rateLimit: maxRequests: 1000 window: "1 day" effect: allow

An agent can burst to 20/min but cannot sustain more than 200/hour or 1000/day.

Rate Limit Monitoring

Track rate limit utilization:

# Current rate limit status
npx @authensor/safeclaw ratelimit status

Rate limit violations in the past 24 hours
npx @authensor/safeclaw audit export \
  --filter reason="rate limit" \
  --since "24h"

Example status output:

Rate Limit Status
──────────────────────────
llm.call (claude-opus-4):  12/20 per minute (60%)
llm.call (gpt-4o):         3/30 per minute (10%)
network.request:           45/100 per minute (45%)
shell.execute:              2/10 per minute (20%)

Combining Rate Limits with Budget Caps

Rate limits control request frequency; budget caps control cost. Use both:

rules: - action: llm.call rateLimit: maxRequests: 20 window: "1 minute" effect: allow

budgets: - scope: global limit: "$50.00" period: daily action: deny

Rate limits catch loops; budgets catch expensive individual calls (e.g., a single call with a 200k-token context window).

Why SafeClaw

446 tests cover sliding window computation, boundary conditions, and multi-tier limits
Deny-by-default means agents have no API access without rate-limited allow rules
Sub-millisecond evaluation ensures rate limit checks do not themselves become a bottleneck
Hash-chained audit trail logs every rate-limited denial for investigation
Works with Claude AND OpenAI — unified rate limiting across LLM providers
MIT licensed — implement cost controls without paying for another tool