Token Budgets for AI Agents: Controlling LLM Spend
Token budgets cap the number of LLM tokens an AI agent can consume over defined periods, preventing cost overruns from verbose prompts, large context windows, and retry loops. SafeClaw by Authensor tracks token consumption per model, per agent, and per team — enforcing soft warnings and hard stops when budgets are approached or exceeded. When a hard stop triggers, the agent's LLM calls are denied until the next budget period resets.
Quick Start
npx @authensor/safeclaw
Understanding Token Costs
LLM pricing is per-token, and costs vary dramatically by model:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 |
| Claude Sonnet 4 | $3.00 | $15.00 |
| GPT-4o | $2.50 | $10.00 |
| Claude Haiku | $0.25 | $1.25 |
An agent using Claude Opus 4 with a 100k-token context window costs approximately $1.50 per input call. Ten calls in a retry loop: $15. A hundred calls overnight: $150. Token budgets prevent this.
Token Budget Configuration
version: "1.0"
description: "Token budget policy"
tokenBudgets:
# Per-model daily limits
- model: "claude-opus-4"
maxTokens:
input: 500000
output: 100000
period: daily
onExceed: deny
reason: "Opus daily token limit"
- model: "claude-sonnet-4"
maxTokens:
input: 2000000
output: 500000
period: daily
onExceed: deny
reason: "Sonnet daily token limit"
- model: "gpt-4o"
maxTokens:
input: 1000000
output: 200000
period: daily
onExceed: deny
reason: "GPT-4o daily token limit"
# Global weekly cap (all models combined)
- model: "*"
maxTokens:
input: 10000000
output: 2000000
period: weekly
onExceed: deny
reason: "Weekly global token cap"
# Monthly organizational limit
- model: "*"
maxTokens:
input: 30000000
output: 6000000
period: monthly
onExceed: deny
reason: "Monthly organizational token limit"
Soft Limits and Hard Stops
Implement a two-tier system that warns before blocking:
tokenBudgets:
# Soft limit at 80% — logs warning, allows action
- model: "claude-opus-4"
maxTokens:
input: 400000
output: 80000
period: daily
onExceed: warn
reason: "Opus at 80% daily budget — warning"
# Hard stop at 100% — denies action
- model: "claude-opus-4"
maxTokens:
input: 500000
output: 100000
period: daily
onExceed: deny
reason: "Opus daily budget exhausted — hard stop"
Per-Agent Token Allocation
Distribute token budgets across agents:
tokenBudgets:
# Code review agent — moderate usage
- agentId: "code-reviewer"
model: "claude-opus-4"
maxTokens:
input: 200000
output: 50000
period: daily
onExceed: deny
# Research agent — high usage (large context)
- agentId: "research-agent"
model: "claude-opus-4"
maxTokens:
input: 500000
output: 100000
period: daily
onExceed: deny
# Formatting agent — low usage (use cheaper model)
- agentId: "formatter"
model: "claude-haiku"
maxTokens:
input: 1000000
output: 200000
period: daily
onExceed: deny
Model Routing by Budget
When an expensive model's budget is exhausted, route to a cheaper alternative:
tokenBudgets:
- model: "claude-opus-4"
maxTokens:
input: 500000
period: daily
onExceed: downgrade
downgradeModel: "claude-sonnet-4"
reason: "Opus budget exceeded — routing to Sonnet"
- model: "claude-sonnet-4"
maxTokens:
input: 2000000
period: daily
onExceed: deny
reason: "Sonnet budget also exceeded — hard stop"
This keeps agents functional on cheaper models rather than stopping them entirely.
Token Budget Monitoring
Track consumption in real time:
# Current token usage
npx @authensor/safeclaw tokens status
Historical token consumption
npx @authensor/safeclaw tokens report --period weekly --since "4 weeks"
Example output:
Token Budget Status (Daily — 2026-02-13)
──────────────────────────────────────────
claude-opus-4:
Input: 312,450 / 500,000 (62.5%)
Output: 67,230 / 100,000 (67.2%)
Est. cost: $6.72
claude-sonnet-4:
Input: 890,120 / 2,000,000 (44.5%)
Output: 234,560 / 500,000 (46.9%)
Est. cost: $6.19
Daily total: $12.91 / ~$20.00 estimated max
Alerts
Configure alerts for budget thresholds:
alerts:
- trigger: tokenBudget.warn
channel: slack
message: "Token budget warning: {model} at {percent}% ({tokensUsed}/{tokensMax})"
- trigger: tokenBudget.deny
channel: pagerduty
message: "Token budget HARD STOP: {model} exceeded {tokensMax} tokens"
Cost Projection
SafeClaw can project when budgets will be exhausted based on current consumption rate:
npx @authensor/safeclaw tokens project --period daily
Projection (Daily — 2026-02-13)
────────────────────────────────
claude-opus-4: Budget exhausted at ~16:45 (current rate: 45k tokens/hr)
claude-sonnet-4: Budget sufficient for today (current rate: 60k tokens/hr)
gpt-4o: Budget sufficient for today (current rate: 12k tokens/hr)
Audit Trail for Token Decisions
Every token budget evaluation is logged:
{
"timestamp": "2026-02-13T16:45:02.119Z",
"action": "llm.call",
"model": "claude-opus-4",
"effect": "deny",
"reason": "Opus daily token limit",
"tokensRequested": { "input": 15000 },
"tokensConsumed": { "input": 498000, "output": 95000 },
"tokenBudget": { "input": 500000, "output": 100000 },
"period": "daily"
}
Why SafeClaw
- 446 tests cover token counting, period resets, multi-model tracking, and downgrade logic
- Deny-by-default means agents cannot make LLM calls without an authorized token budget
- Sub-millisecond evaluation ensures token budget checks do not add latency to LLM calls
- Hash-chained audit trail provides tamper-proof token consumption records for finance teams
- Works with Claude AND OpenAI — unified token tracking across all providers
- MIT licensed — control LLM costs without paying for a cost management tool
See Also
- API Rate Limiting for AI Agents: Preventing Runaway Costs
- How to Control AI Agent Costs with Budget Policies
- AI Agent Incident Response: A Playbook for Engineering Teams
- Building an AI Governance Framework with SafeClaw
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw