2026-01-29 · Authensor

How to Prevent AI Agents from Running Up Cloud Costs

To prevent AI agents from running up cloud costs by creating expensive resources, use SafeClaw action-level gating to block shell_exec and network actions that provision cloud infrastructure. SafeClaw denies aws, gcloud, az, and terraform apply commands before execution. Install with npx @authensor/safeclaw.

The Risk

Cloud resources bill by the second. An AI agent that runs aws ec2 run-instances --instance-type p4d.24xlarge --count 10 just launched ten GPU instances at $32.77/hour each — $327.70 per hour, $7,865 per day. terraform apply on a misconfigured plan can provision dozens of resources in seconds. gcloud compute instances create with the wrong machine type can generate thousands of dollars in costs before you notice.

AI agents create cloud resources when asked to "set up the staging environment," "deploy to production," "scale the service," or "run the benchmark on a bigger machine." The agent picks instance types based on its training data, which may not reflect your budget or your organization's instance policies. A "reasonable" instance for the agent's training context could be a $20/hour machine that runs indefinitely because the agent didn't set an auto-shutdown.

The danger compounds because cloud resources persist. Unlike a shell command that finishes, a provisioned instance runs (and bills) until explicitly terminated. If the agent creates resources in a region you don't monitor, or uses a service you don't have billing alerts for, the charges accumulate silently.

Cloud billing alerts help you react to cost spikes but don't prevent them. By the time the alert fires, you've already incurred the charges. Most cloud providers don't offer real-time spending caps that hard-stop resource creation.

The One-Minute Fix

Step 1: Install SafeClaw.

npx @authensor/safeclaw

Step 2: Get your free API key at safeclaw.onrender.com (7-day renewable, no credit card).

Step 3: Add this policy rule:

- action: shell_exec
  pattern: "aws\\s+(ec2|ecs|eks|lambda|rds|s3)|gcloud\\s+compute|az\\s+(vm|aks|sql)|terraform\\s+apply"
  effect: deny
  reason: "Cloud resource provisioning blocked"

The agent can no longer create, modify, or destroy cloud resources.

Full Policy

name: block-cloud-provisioning
version: "1.0"
defaultEffect: deny
rules:
  # Block AWS resource creation
  - action: shell_exec
    pattern: "aws\\s+(ec2|ecs|eks|lambda|rds|s3|sagemaker|emr|redshift|elasticache)\\s+(run|create|put|start|launch)"
    effect: deny
    reason: "AWS resource creation blocked"

# Block GCP resource creation
  - action: shell_exec
    pattern: "gcloud\\s+(compute|container|sql|functions|run|ai)\\s+(create|deploy|instances create)"
    effect: deny
    reason: "GCP resource creation blocked"

# Block Azure resource creation
  - action: shell_exec
    pattern: "az\\s+(vm|aks|sql|functionapp|webapp|container)\\s+(create|deploy|start)"
    effect: deny
    reason: "Azure resource creation blocked"

# Block infrastructure-as-code apply
  - action: shell_exec
    pattern: "terraform\\s+(apply|destroy)|pulumi\\s+(up|destroy)|cdk\\s+deploy"
    effect: deny
    reason: "Infrastructure-as-code execution blocked"

# Block Docker resource creation on cloud
  - action: shell_exec
    pattern: "docker\\s+(service|stack)\\s+(create|deploy)|docker-compose.*up"
    effect: deny
    reason: "Container orchestration deployment blocked"

# Block cloud API network calls
  - action: network
    pattern: "ec2\\.amazonaws|compute\\.googleapis|management\\.azure"
    effect: deny
    reason: "Cloud management API access blocked"

# Allow read-only cloud operations
  - action: shell_exec
    pattern: "aws\\s+\\w+\\s+(describe|list|get)|gcloud\\s+\\w+\\s+(list|describe)|az\\s+\\w+\\s+(list|show)|terraform\\s+(plan|show|state list)"
    effect: allow
    reason: "Read-only cloud operations permitted"

What Gets Blocked

These action requests are DENIED:

{
  "action": "shell_exec",
  "command": "aws ec2 run-instances --instance-type p4d.24xlarge --count 5",
  "agent": "infra-agent",
  "result": "DENIED — AWS resource creation blocked"
}

{
  "action": "shell_exec",
  "command": "terraform apply -auto-approve",
  "agent": "deploy-bot",
  "result": "DENIED — Infrastructure-as-code execution blocked"
}

{
  "action": "shell_exec",
  "command": "gcloud compute instances create ml-worker --machine-type=a2-highgpu-8g",
  "agent": "ml-assistant",
  "result": "DENIED — GCP resource creation blocked"
}

What Still Works

These safe actions are ALLOWED:

{
  "action": "shell_exec",
  "command": "aws ec2 describe-instances --filters 'Name=tag:env,Values=staging'",
  "agent": "infra-agent",
  "result": "ALLOWED — Read-only cloud operations permitted"
}

{
  "action": "shell_exec",
  "command": "terraform plan",
  "agent": "deploy-bot",
  "result": "ALLOWED — Read-only cloud operations permitted"
}

Your agent can still list resources, describe configurations, check status, and run terraform plan to preview changes. It just can't create, modify, or destroy resources.

Why Other Approaches Don't Work

Cloud billing alerts notify you after costs are incurred. They don't prevent resource creation. A GPU cluster running for 4 hours before you see the alert has already cost hundreds of dollars.

IAM policies (AWS) and service account permissions (GCP/Azure) can restrict what the agent can do, but most developers give agents their own credentials — which have broad permissions. Creating a dedicated restricted IAM role per agent per task is operationally heavy and easy to misconfigure.

Cloud budgets with auto-stop exist on some providers but have delays (AWS budgets update every 8-12 hours). Real-time enforcement doesn't exist at the cloud provider level.

Prompt instructions ("don't create expensive resources") are ignored when the agent determines that resource creation is necessary to complete the task. "Deploy this to staging" requires creating resources — the agent will do what it thinks the task demands.

SafeClaw blocks the command before it reaches the cloud CLI or API. Sub-millisecond evaluation. Deny-by-default means even cloud CLIs you didn't anticipate are blocked. Every denied action is logged in a tamper-proof audit trail (SHA-256 hash chain). 446 tests, TypeScript strict mode, zero third-party dependencies. Use simulation mode to test your policy against real agent workflows before enforcing it.

Cross-References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw