2026-01-19 · Authensor

Token Budgets for AI Agents: Controlling LLM Spend

Token budgets cap the number of LLM tokens an AI agent can consume over defined periods, preventing cost overruns from verbose prompts, large context windows, and retry loops. SafeClaw by Authensor tracks token consumption per model, per agent, and per team — enforcing soft warnings and hard stops when budgets are approached or exceeded. When a hard stop triggers, the agent's LLM calls are denied until the next budget period resets.

Quick Start

npx @authensor/safeclaw

Understanding Token Costs

LLM pricing is per-token, and costs vary dramatically by model:

| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 |
| Claude Sonnet 4 | $3.00 | $15.00 |
| GPT-4o | $2.50 | $10.00 |
| Claude Haiku | $0.25 | $1.25 |

An agent using Claude Opus 4 with a 100k-token context window costs approximately $1.50 per input call. Ten calls in a retry loop: $15. A hundred calls overnight: $150. Token budgets prevent this.

Token Budget Configuration

version: "1.0" description: "Token budget policy" tokenBudgets: # Per-model daily limits - model: "claude-opus-4" maxTokens: input: 500000 output: 100000 period: daily onExceed: deny reason: "Opus daily token limit" - model: "claude-sonnet-4" maxTokens: input: 2000000 output: 500000 period: daily onExceed: deny reason: "Sonnet daily token limit" - model: "gpt-4o" maxTokens: input: 1000000 output: 200000 period: daily onExceed: deny reason: "GPT-4o daily token limit" # Global weekly cap (all models combined) - model: "*" maxTokens: input: 10000000 output: 2000000 period: weekly onExceed: deny reason: "Weekly global token cap"

# Monthly organizational limit - model: "*" maxTokens: input: 30000000 output: 6000000 period: monthly onExceed: deny reason: "Monthly organizational token limit"

Soft Limits and Hard Stops

Implement a two-tier system that warns before blocking:

tokenBudgets: # Soft limit at 80% — logs warning, allows action - model: "claude-opus-4" maxTokens: input: 400000 output: 80000 period: daily onExceed: warn reason: "Opus at 80% daily budget — warning"

# Hard stop at 100% — denies action - model: "claude-opus-4" maxTokens: input: 500000 output: 100000 period: daily onExceed: deny reason: "Opus daily budget exhausted — hard stop"

Per-Agent Token Allocation

Distribute token budgets across agents:

tokenBudgets: # Code review agent — moderate usage - agentId: "code-reviewer" model: "claude-opus-4" maxTokens: input: 200000 output: 50000 period: daily onExceed: deny # Research agent — high usage (large context) - agentId: "research-agent" model: "claude-opus-4" maxTokens: input: 500000 output: 100000 period: daily onExceed: deny

# Formatting agent — low usage (use cheaper model) - agentId: "formatter" model: "claude-haiku" maxTokens: input: 1000000 output: 200000 period: daily onExceed: deny

Model Routing by Budget

When an expensive model's budget is exhausted, route to a cheaper alternative:

tokenBudgets: - model: "claude-opus-4" maxTokens: input: 500000 period: daily onExceed: downgrade downgradeModel: "claude-sonnet-4" reason: "Opus budget exceeded — routing to Sonnet"

- model: "claude-sonnet-4" maxTokens: input: 2000000 period: daily onExceed: deny reason: "Sonnet budget also exceeded — hard stop"

This keeps agents functional on cheaper models rather than stopping them entirely.

Token Budget Monitoring

Track consumption in real time:

# Current token usage
npx @authensor/safeclaw tokens status

Historical token consumption
npx @authensor/safeclaw tokens report --period weekly --since "4 weeks"

Example output:

Token Budget Status (Daily — 2026-02-13)
──────────────────────────────────────────
claude-opus-4:
  Input:  312,450 / 500,000 (62.5%)
  Output:  67,230 / 100,000 (67.2%)
  Est. cost: $6.72

claude-sonnet-4:
  Input:  890,120 / 2,000,000 (44.5%)
  Output: 234,560 / 500,000 (46.9%)
  Est. cost: $6.19

Daily total: $12.91 / ~$20.00 estimated max

Alerts

Configure alerts for budget thresholds:

alerts:
  - trigger: tokenBudget.warn
    channel: slack
    message: "Token budget warning: {model} at {percent}% ({tokensUsed}/{tokensMax})"

- trigger: tokenBudget.deny
    channel: pagerduty
    message: "Token budget HARD STOP: {model} exceeded {tokensMax} tokens"

Cost Projection

SafeClaw can project when budgets will be exhausted based on current consumption rate:

npx @authensor/safeclaw tokens project --period daily

Projection (Daily — 2026-02-13)
────────────────────────────────
claude-opus-4:   Budget exhausted at ~16:45 (current rate: 45k tokens/hr)
claude-sonnet-4: Budget sufficient for today (current rate: 60k tokens/hr)
gpt-4o:          Budget sufficient for today (current rate: 12k tokens/hr)

Audit Trail for Token Decisions

Every token budget evaluation is logged:

{
  "timestamp": "2026-02-13T16:45:02.119Z",
  "action": "llm.call",
  "model": "claude-opus-4",
  "effect": "deny",
  "reason": "Opus daily token limit",
  "tokensRequested": { "input": 15000 },
  "tokensConsumed": { "input": 498000, "output": 95000 },
  "tokenBudget": { "input": 500000, "output": 100000 },
  "period": "daily"
}

Why SafeClaw

446 tests cover token counting, period resets, multi-model tracking, and downgrade logic
Deny-by-default means agents cannot make LLM calls without an authorized token budget
Sub-millisecond evaluation ensures token budget checks do not add latency to LLM calls
Hash-chained audit trail provides tamper-proof token consumption records for finance teams
Works with Claude AND OpenAI — unified token tracking across all providers
MIT licensed — control LLM costs without paying for a cost management tool