2026-01-23 · Authensor

API Rate Limiting for AI Agents: Preventing Runaway Costs

AI agents in loops can burn through API rate limits and budgets in minutes — a code generation agent retrying a failed build, a research agent fetching the same URL repeatedly, or a data agent making thousands of database queries. SafeClaw by Authensor enforces rate limits at the action level: every API call, shell command, and network request is counted against configurable sliding window limits, and excess calls are denied before execution. No calls leak through.

Quick Start

npx @authensor/safeclaw

Sliding Window Rate Limits

SafeClaw uses a sliding window algorithm to avoid the burst problem of fixed windows. Configure limits per action type:

version: "1.0"
description: "Rate-limited agent policy"

rules:
# LLM API rate limits
- action: llm.call
model: "claude-opus-4"
rateLimit:
maxRequests: 20
window: "1 minute"
effect: allow
reason: "Opus calls: max 20/min"

- action: llm.call
model: "gpt-4o"
rateLimit:
maxRequests: 30
window: "1 minute"
effect: allow
reason: "GPT-4o calls: max 30/min"

# Network request rate limits
- action: network.request
domain: "api.github.com"
rateLimit:
maxRequests: 60
window: "1 hour"
effect: allow
reason: "GitHub API: respect their rate limit"

- action: network.request
domain: "*.internal.company.com"
rateLimit:
maxRequests: 100
window: "1 minute"
effect: allow
reason: "Internal API: 100 req/min"

# Shell execution rate limits (catches loops)
- action: shell.execute
rateLimit:
maxRequests: 30
window: "1 minute"
effect: allow
reason: "Shell commands: max 30/min"

- action: "*"
effect: deny
reason: "Default deny"

Why Fixed Windows Fail

Fixed window rate limiting (e.g., "100 requests per minute starting at :00") allows burst attacks at window boundaries:

Window 1 (:00 - :59): Agent makes 0 requests for 59 seconds, then 100 at :59
Window 2 (:00 - :59): Agent makes 100 requests at :00
Result: 200 requests in 2 seconds across the window boundary

Sliding windows prevent this by counting requests in a rolling time period. At any given moment, the count includes all requests from the past N seconds.

Per-Action Rate Limits

Different action types carry different cost and risk profiles. Set appropriate limits for each:

rules:
  # File reads: high limit (cheap, low risk)
  - action: file.read
    rateLimit:
      maxRequests: 200
      window: "1 minute"
    effect: allow

# File writes: moderate limit (medium risk)
- action: file.write
rateLimit:
maxRequests: 50
window: "1 minute"
effect: allow

# LLM calls: low limit (expensive)
- action: llm.call
rateLimit:
maxRequests: 10
window: "1 minute"
effect: allow

# Network requests: moderate limit
- action: network.request
rateLimit:
maxRequests: 30
window: "1 minute"
effect: allow

# Shell execution: low limit (highest risk)
- action: shell.execute
rateLimit:
maxRequests: 10
window: "1 minute"
effect: allow

Loop Detection

Agents stuck in retry loops are the most common cause of runaway costs. SafeClaw's rate limits catch them:

14:32:01 — llm.call claude-opus-4 → ALLOW (1/20 in window)
14:32:02 — llm.call claude-opus-4 → ALLOW (2/20)
14:32:03 — llm.call claude-opus-4 → ALLOW (3/20)
...
14:32:19 — llm.call claude-opus-4 → ALLOW (19/20)
14:32:20 — llm.call claude-opus-4 → ALLOW (20/20)
14:32:21 — llm.call claude-opus-4 → DENY (rate limit exceeded)

Without rate limits, this loop could make thousands of calls before anyone notices. With SafeClaw, the agent is stopped after 20 calls in the first minute.

Tiered Rate Limits

Implement progressive throttling:

rules:
  # Tier 1: Normal rate
  - action: llm.call
    rateLimit:
      maxRequests: 20
      window: "1 minute"
    effect: allow

# Tier 2: Hourly cap (catches sustained high usage)
- action: llm.call
rateLimit:
maxRequests: 200
window: "1 hour"
effect: allow

# Tier 3: Daily cap (absolute maximum)
- action: llm.call
rateLimit:
maxRequests: 1000
window: "1 day"
effect: allow

An agent can burst to 20/min but cannot sustain more than 200/hour or 1000/day.

Rate Limit Monitoring

Track rate limit utilization:

# Current rate limit status
npx @authensor/safeclaw ratelimit status

Rate limit violations in the past 24 hours

npx @authensor/safeclaw audit export \ --filter reason="rate limit" \ --since "24h"

Example status output:

Rate Limit Status
──────────────────────────
llm.call (claude-opus-4):  12/20 per minute (60%)
llm.call (gpt-4o):         3/30 per minute (10%)
network.request:           45/100 per minute (45%)
shell.execute:              2/10 per minute (20%)

Combining Rate Limits with Budget Caps

Rate limits control request frequency; budget caps control cost. Use both:

rules:
  - action: llm.call
    rateLimit:
      maxRequests: 20
      window: "1 minute"
    effect: allow

budgets:
- scope: global
limit: "$50.00"
period: daily
action: deny

Rate limits catch loops; budgets catch expensive individual calls (e.g., a single call with a 200k-token context window).

Why SafeClaw

See Also

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw