How to Safely Use OpenAI Agents
To safely use OpenAI Agents, add SafeClaw action-level gating. Install with npx @authensor/safeclaw and define a deny-by-default policy that controls which tools the agent can invoke, which files it can access through Code Interpreter, and which function calls it can make through the Assistants API or Agents SDK. OpenAI's agent ecosystem — spanning the Assistants API, the Agents SDK, and Code Interpreter — can execute arbitrary Python code, read uploaded files, call external functions you define, and chain these operations across conversation turns.
What OpenAI Agents Can Do (And Why That's Risky)
OpenAI provides multiple agent surfaces. Each has distinct capabilities:
- Code Interpreter — executes arbitrary Python in a sandboxed container. It can read files you upload, generate files, perform computations, install Python packages with
pip, and write output files. The sandbox has network access limitations, but code execution itself is unrestricted. - Function Calling (Assistants API) — the model generates structured JSON arguments for functions you define. Your application is responsible for executing these functions. If your function handlers touch the filesystem, database, or external APIs, the model controls what parameters they receive.
- Agents SDK — OpenAI's framework for building multi-step agents with tools, handoffs, and guardrails. Agents call tools you register, and the SDK orchestrates execution. Each tool call is a potential action on your infrastructure.
- File Search — agents can search through uploaded files using vector retrieval. While read-only, this surfaces sensitive content from documents you attach to the assistant.
- Multi-turn autonomous execution — agents loop through tool calls and reasoning steps without returning to the user, meaning a single prompt can trigger dozens of function calls.
Step-by-Step Setup
Step 1: Install SafeClaw
npx @authensor/safeclaw
Select SDK Wrapper as the integration type. For OpenAI agents, SafeClaw wraps your tool/function handlers with policy evaluation.
Step 2: Get Your API Key
Visit safeclaw.onrender.com to create a free-tier key. Free keys renew every 7 days with no credit card. The dashboard wizard helps generate your initial policy.
Step 3: Wrap Your Tool Handlers (Agents SDK)
import { SafeClaw } from "@authensor/safeclaw";
import { Agent, tool } from "@openai/agents";
const safeclaw = new SafeClaw({
apiKey: process.env.SAFECLAW_API_KEY,
policy: "./safeclaw.policy.yaml",
});
const writeFile = tool({
name: "write_file",
description: "Write content to a file",
parameters: { path: { type: "string" }, content: { type: "string" } },
execute: safeclaw.guard("file_write", async (params) => {
// Only runs if policy allows this specific path
await fs.writeFile(params.path, params.content);
return { success: true };
}),
});
const runCommand = tool({
name: "run_command",
description: "Execute a shell command",
parameters: { command: { type: "string" } },
execute: safeclaw.guard("shell_exec", async (params) => {
return await exec(params.command);
}),
});
const agent = new Agent({
name: "coding-assistant",
tools: [writeFile, runCommand],
});
Step 4: Wrap Function Handlers (Assistants API)
const safeclaw = new SafeClaw({
apiKey: process.env.SAFECLAW_API_KEY,
policy: "./safeclaw.policy.yaml",
});
// Before executing any function call from the Assistants API:
async function handleFunctionCall(call) {
const verdict = await safeclaw.evaluate({
action: mapFunctionToAction(call.function.name),
params: JSON.parse(call.function.arguments),
});
if (verdict.effect === "deny") {
return { error: Action denied: ${verdict.reason} };
}
return await executeFunctionCall(call);
}
Step 5: Define Your Policy
version: 1
default: deny
rules:
- action: file_read
path: "${PROJECT_DIR}/data/**"
effect: allow
- action: file_read
path: "*/.env"
effect: deny
- action: file_write
path: "${PROJECT_DIR}/output/**"
effect: allow
- action: file_write
path: "${PROJECT_DIR}/src/**"
effect: deny
- action: shell_exec
command: "python*"
effect: allow
- action: shell_exec
command: "pip install*"
effect: deny
- action: shell_exec
command: "rm*"
effect: deny
- action: network
host: "api.openai.com"
effect: allow
- action: network
host: "*.internal.company.com"
effect: deny
- action: network
host: "*"
effect: deny
Step 6: Simulate Before Enforcing
npx @authensor/safeclaw simulate --policy safeclaw.policy.yaml
Review logged verdicts, refine rules, then enforce.
What Gets Blocked, What Gets Through
ALLOWED — Agent writes analysis output:
{ "action": "file_write", "path": "/project/output/report.csv", "verdict": "ALLOW" }
DENIED — Agent tries to modify source code:
{ "action": "file_write", "path": "/project/src/auth/login.ts", "verdict": "DENY", "reason": "path matches src/** deny rule" }
ALLOWED — Agent runs a Python script:
{ "action": "shell_exec", "command": "python analyze.py --input data.csv", "verdict": "ALLOW" }
DENIED — Agent installs an unknown package:
{ "action": "shell_exec", "command": "pip install obscure-package", "verdict": "DENY", "reason": "pip install* matches deny rule" }
DENIED — Agent calls internal API:
{ "action": "network", "host": "db.internal.company.com", "verdict": "DENY", "reason": "host matches *.internal.company.com deny rule" }
Without SafeClaw vs With SafeClaw
| Scenario | Without SafeClaw | With SafeClaw |
|---|---|---|
| Function call writes to /etc/ via misinterpreted path | File written to system directory | Blocked — path outside allowed output/** |
| Agent's tool handler calls internal microservice | Request sent with ambient credentials | Blocked — internal hosts denied by network rule |
| Agent generates report to output/ directory | Report written normally | Allowed — output/** is in write allowlist |
| Agent tries pip install in tool handler | Package installed, install scripts execute | Blocked — pip install* matches deny rule |
| Agent reads uploaded data file | Data read for analysis | Allowed — data/** is in read allowlist |
Every action evaluation is recorded in SafeClaw's tamper-proof audit trail (SHA-256 hash chain). The control plane receives only action metadata — never your OpenAI API key, function arguments, or file contents. SafeClaw runs with zero third-party dependencies, evaluates in sub-millisecond time, and is backed by 446 tests under TypeScript strict mode. The client is 100% open source, MIT licensed.
Cross-References
- What is SafeClaw? — Core architecture of action-level gating
- How to Safely Run LangChain Agents — LangChain also uses SDK wrapper integration
- How to Safely Run CrewAI Agents — Multi-agent safety patterns for OpenAI-powered crews
- SafeClaw Policy Reference — Full policy syntax and examples
- How to Safely Run Autonomous Coding Agents — General autonomous agent safety
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw