2026-01-19 · Authensor

Token Budgets for AI Agents: Controlling LLM Spend

Token budgets cap the number of LLM tokens an AI agent can consume over defined periods, preventing cost overruns from verbose prompts, large context windows, and retry loops. SafeClaw by Authensor tracks token consumption per model, per agent, and per team — enforcing soft warnings and hard stops when budgets are approached or exceeded. When a hard stop triggers, the agent's LLM calls are denied until the next budget period resets.

Quick Start

npx @authensor/safeclaw

Understanding Token Costs

LLM pricing is per-token, and costs vary dramatically by model:

| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 |
| Claude Sonnet 4 | $3.00 | $15.00 |
| GPT-4o | $2.50 | $10.00 |
| Claude Haiku | $0.25 | $1.25 |

An agent using Claude Opus 4 with a 100k-token context window costs approximately $1.50 per input call. Ten calls in a retry loop: $15. A hundred calls overnight: $150. Token budgets prevent this.

Token Budget Configuration

version: "1.0"
description: "Token budget policy"

tokenBudgets:
# Per-model daily limits
- model: "claude-opus-4"
maxTokens:
input: 500000
output: 100000
period: daily
onExceed: deny
reason: "Opus daily token limit"

- model: "claude-sonnet-4"
maxTokens:
input: 2000000
output: 500000
period: daily
onExceed: deny
reason: "Sonnet daily token limit"

- model: "gpt-4o"
maxTokens:
input: 1000000
output: 200000
period: daily
onExceed: deny
reason: "GPT-4o daily token limit"

# Global weekly cap (all models combined)
- model: "*"
maxTokens:
input: 10000000
output: 2000000
period: weekly
onExceed: deny
reason: "Weekly global token cap"

# Monthly organizational limit
- model: "*"
maxTokens:
input: 30000000
output: 6000000
period: monthly
onExceed: deny
reason: "Monthly organizational token limit"

Soft Limits and Hard Stops

Implement a two-tier system that warns before blocking:

tokenBudgets:
  # Soft limit at 80% — logs warning, allows action
  - model: "claude-opus-4"
    maxTokens:
      input: 400000
      output: 80000
    period: daily
    onExceed: warn
    reason: "Opus at 80% daily budget — warning"

# Hard stop at 100% — denies action
- model: "claude-opus-4"
maxTokens:
input: 500000
output: 100000
period: daily
onExceed: deny
reason: "Opus daily budget exhausted — hard stop"

Per-Agent Token Allocation

Distribute token budgets across agents:

tokenBudgets:
  # Code review agent — moderate usage
  - agentId: "code-reviewer"
    model: "claude-opus-4"
    maxTokens:
      input: 200000
      output: 50000
    period: daily
    onExceed: deny

# Research agent — high usage (large context)
- agentId: "research-agent"
model: "claude-opus-4"
maxTokens:
input: 500000
output: 100000
period: daily
onExceed: deny

# Formatting agent — low usage (use cheaper model)
- agentId: "formatter"
model: "claude-haiku"
maxTokens:
input: 1000000
output: 200000
period: daily
onExceed: deny

Model Routing by Budget

When an expensive model's budget is exhausted, route to a cheaper alternative:

tokenBudgets:
  - model: "claude-opus-4"
    maxTokens:
      input: 500000
    period: daily
    onExceed: downgrade
    downgradeModel: "claude-sonnet-4"
    reason: "Opus budget exceeded — routing to Sonnet"

- model: "claude-sonnet-4"
maxTokens:
input: 2000000
period: daily
onExceed: deny
reason: "Sonnet budget also exceeded — hard stop"

This keeps agents functional on cheaper models rather than stopping them entirely.

Token Budget Monitoring

Track consumption in real time:

# Current token usage
npx @authensor/safeclaw tokens status

Historical token consumption

npx @authensor/safeclaw tokens report --period weekly --since "4 weeks"

Example output:

Token Budget Status (Daily — 2026-02-13)
──────────────────────────────────────────
claude-opus-4:
  Input:  312,450 / 500,000 (62.5%)
  Output:  67,230 / 100,000 (67.2%)
  Est. cost: $6.72

claude-sonnet-4:
Input: 890,120 / 2,000,000 (44.5%)
Output: 234,560 / 500,000 (46.9%)
Est. cost: $6.19

Daily total: $12.91 / ~$20.00 estimated max

Alerts

Configure alerts for budget thresholds:

alerts:
  - trigger: tokenBudget.warn
    channel: slack
    message: "Token budget warning: {model} at {percent}% ({tokensUsed}/{tokensMax})"

- trigger: tokenBudget.deny
channel: pagerduty
message: "Token budget HARD STOP: {model} exceeded {tokensMax} tokens"

Cost Projection

SafeClaw can project when budgets will be exhausted based on current consumption rate:

npx @authensor/safeclaw tokens project --period daily
Projection (Daily — 2026-02-13)
────────────────────────────────
claude-opus-4:   Budget exhausted at ~16:45 (current rate: 45k tokens/hr)
claude-sonnet-4: Budget sufficient for today (current rate: 60k tokens/hr)
gpt-4o:          Budget sufficient for today (current rate: 12k tokens/hr)

Audit Trail for Token Decisions

Every token budget evaluation is logged:

{
  "timestamp": "2026-02-13T16:45:02.119Z",
  "action": "llm.call",
  "model": "claude-opus-4",
  "effect": "deny",
  "reason": "Opus daily token limit",
  "tokensRequested": { "input": 15000 },
  "tokensConsumed": { "input": 498000, "output": 95000 },
  "tokenBudget": { "input": 500000, "output": 100000 },
  "period": "daily"
}

Why SafeClaw

See Also

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw