2026-01-26 · Authensor

How to Safely Run AutoGen Agents

To safely run AutoGen agents, add SafeClaw action-level gating. Install with npx @authensor/safeclaw and define a deny-by-default policy that controls which code blocks get executed, which files agents can access, and which network requests are permitted. AutoGen by Microsoft enables multi-agent conversations where agents generate and execute code, call functions, and interact with external systems — all driven by conversational back-and-forth between agents that runs without human input.

What AutoGen Agents Can Do (And Why That's Risky)

AutoGen's architecture is built around conversational agents that generate and execute code. The risks are specific:

Code execution via CodeExecutor — AutoGen extracts code blocks from agent messages and runs them. LocalCommandLineCodeExecutor runs code directly on your machine with full OS access. DockerCommandLineCodeExecutor runs in a container but still has network access and mounted volumes.
Multi-agent conversations without human checkpoints — ConversableAgent instances talk to each other in loops. An AssistantAgent generates code, a UserProxyAgent (with human_input_mode="NEVER") executes it automatically. Conversations can run for dozens of turns.
Function registration — you register Python functions that agents can call. These functions run in your process with your credentials and filesystem access.
Nested conversations — AutoGen supports nested chat patterns where agents spawn sub-conversations with other agents, creating execution chains that are hard to trace.
File I/O through generated code — agents write Python code that reads and writes files. The code itself is generated at runtime, so you cannot predict at design time what file paths it will access.
Network requests through generated code — agents frequently generate code that imports requests or urllib and makes HTTP calls to arbitrary URLs. These happen inside code execution, not through a dedicated network tool.

The core problem: AutoGen's UserProxyAgent auto-executes whatever code the AssistantAgent generates. There is no per-action policy layer between code generation and code execution.

Step-by-Step Setup

Step 1: Install SafeClaw

npx @authensor/safeclaw

Select SDK Wrapper as the integration type.

Step 2: Get Your API Key

Visit safeclaw.onrender.com. Free-tier keys renew every 7 days, no credit card required. Use the dashboard wizard for initial policy generation.

Step 3: Wrap the Code Executor

SafeClaw intercepts code execution before it reaches the OS:

import { SafeClaw } from "@authensor/safeclaw";

const safeclaw = new SafeClaw({
  apiKey: process.env.SAFECLAW_API_KEY,
  policy: "./safeclaw.policy.yaml",
});

// Wrap the code executor to evaluate each code block
const originalExecutor = new LocalCommandLineCodeExecutor({
  work_dir: "./workspace",
});

const guardedExecutor = safeclaw.wrapExecutor(originalExecutor, {
  // SafeClaw analyzes the code block for:
  // - file_read / file_write operations
  // - shell_exec commands (subprocess, os.system)
  // - network requests (requests, urllib, socket)
  analyzeCode: true,
});

const userProxy = new UserProxyAgent({
  name: "executor",
  human_input_mode: "NEVER",
  code_execution_config: {
    executor: guardedExecutor,
  },
});

Step 4: Wrap Registered Functions

// Instead of raw function registration:
const guardedFunction = safeclaw.guard("shell_exec", async (params) => {
  return await runDatabaseQuery(params.query);
});

assistant.register_function({
  name: "query_database",
  func: guardedFunction,
});

Step 5: Define Your Policy

version: 1 default: deny rules: - action: file_read path: "${PROJECT_DIR}/workspace/**" effect: allow - action: file_read path: "${PROJECT_DIR}/data/**" effect: allow - action: file_read path: "*/.env" effect: deny - action: file_read path: "*/." effect: deny - action: file_write path: "${PROJECT_DIR}/workspace/**" effect: allow - action: file_write path: "${PROJECT_DIR}/output/**" effect: allow - action: shell_exec command: "python*" effect: allow - action: shell_exec command: "pip*" effect: deny - action: shell_exec command: "rm*" effect: deny - action: shell_exec command: "curl*" effect: deny - action: shell_exec command: "wget*" effect: deny - action: network host: "api.openai.com" effect: allow - action: network host: "localhost" effect: allow

- action: network host: "*" effect: deny

Step 6: Simulate Before Enforcing

npx @authensor/safeclaw simulate --policy safeclaw.policy.yaml

Run a typical AutoGen conversation. Review every verdict in the log. Tighten rules where needed.

What Gets Blocked, What Gets Through

ALLOWED — Agent writes analysis code to workspace:

{ "action": "file_write", "path": "/project/workspace/analyze.py", "verdict": "ALLOW" }

DENIED — Generated code reads SSH keys:

{ "action": "file_read", "path": "/home/user/.ssh/id_rsa", "verdict": "DENY", "reason": "path matches */.  deny rule" }

ALLOWED — Agent runs generated Python script:

{ "action": "shell_exec", "command": "python workspace/analyze.py", "verdict": "ALLOW" }

DENIED — Generated code installs a package:

{ "action": "shell_exec", "command": "pip install cryptography", "verdict": "DENY", "reason": "pip* matches deny rule" }

DENIED — Generated code POSTs to external server:

{ "action": "network", "host": "webhook.site", "verdict": "DENY", "reason": "host not in allowlist, default deny" }

Without SafeClaw vs With SafeClaw

| Scenario | Without SafeClaw | With SafeClaw |
|---|---|---|
| AssistantAgent generates code that reads /etc/passwd | Code executes, file contents returned to agent | Blocked — path outside allowed workspace and data directories |
| Generated code runs os.system("rm -rf /") | Command executes with process permissions | Blocked — rm* matches deny rule |
| Agent code imports requests and POSTs to unknown URL | HTTP request sent with whatever data the agent collected | Blocked — host not in network allowlist |
| Agent writes output file to workspace/results.csv | File written normally | Allowed — workspace/** is in write allowlist |
| Multi-turn conversation runs 30 code blocks | All 30 execute without review | Each of the 30 blocks is individually evaluated against policy |

Every evaluation is logged to SafeClaw's tamper-proof audit trail (SHA-256 hash chain). The control plane sees only action metadata — never your code, data, or API keys. SafeClaw runs with zero third-party dependencies, evaluates in sub-millisecond time, and is validated by 446 tests under TypeScript strict mode. The client is 100% open source, MIT licensed.

Cross-References

What is SafeClaw? — Deny-by-default action gating architecture
How to Safely Run CrewAI Agents — Another multi-agent framework with per-agent policies
How to Safely Run LangChain Agents — AutoGen can use LangChain tools
How to Safely Use OpenAI Agents — OpenAI powers many AutoGen configurations
How to Safely Run Autonomous Coding Agents — General patterns for autonomous code execution safety

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw