2026-02-05 · Authensor

How to Safely Use OpenAI Agents

To safely use OpenAI Agents, add SafeClaw action-level gating. Install with npx @authensor/safeclaw and define a deny-by-default policy that controls which tools the agent can invoke, which files it can access through Code Interpreter, and which function calls it can make through the Assistants API or Agents SDK. OpenAI's agent ecosystem — spanning the Assistants API, the Agents SDK, and Code Interpreter — can execute arbitrary Python code, read uploaded files, call external functions you define, and chain these operations across conversation turns.

What OpenAI Agents Can Do (And Why That's Risky)

OpenAI provides multiple agent surfaces. Each has distinct capabilities:

Code Interpreter — executes arbitrary Python in a sandboxed container. It can read files you upload, generate files, perform computations, install Python packages with pip, and write output files. The sandbox has network access limitations, but code execution itself is unrestricted.
Function Calling (Assistants API) — the model generates structured JSON arguments for functions you define. Your application is responsible for executing these functions. If your function handlers touch the filesystem, database, or external APIs, the model controls what parameters they receive.
Agents SDK — OpenAI's framework for building multi-step agents with tools, handoffs, and guardrails. Agents call tools you register, and the SDK orchestrates execution. Each tool call is a potential action on your infrastructure.
File Search — agents can search through uploaded files using vector retrieval. While read-only, this surfaces sensitive content from documents you attach to the assistant.
Multi-turn autonomous execution — agents loop through tool calls and reasoning steps without returning to the user, meaning a single prompt can trigger dozens of function calls.

The risk is in the function handlers and tool definitions you provide. OpenAI's infrastructure sandboxes Code Interpreter, but everything in Function Calling and the Agents SDK runs in YOUR environment with YOUR permissions.

Step-by-Step Setup

Step 1: Install SafeClaw

npx @authensor/safeclaw

Select SDK Wrapper as the integration type. For OpenAI agents, SafeClaw wraps your tool/function handlers with policy evaluation.

Step 2: Get Your API Key

Visit safeclaw.onrender.com to create a free-tier key. Free keys renew every 7 days with no credit card. The dashboard wizard helps generate your initial policy.

Step 3: Wrap Your Tool Handlers (Agents SDK)

import { SafeClaw } from "@authensor/safeclaw";
import { Agent, tool } from "@openai/agents";

const safeclaw = new SafeClaw({
  apiKey: process.env.SAFECLAW_API_KEY,
  policy: "./safeclaw.policy.yaml",
});

const writeFile = tool({
  name: "write_file",
  description: "Write content to a file",
  parameters: { path: { type: "string" }, content: { type: "string" } },
  execute: safeclaw.guard("file_write", async (params) => {
    // Only runs if policy allows this specific path
    await fs.writeFile(params.path, params.content);
    return { success: true };
  }),
});

const runCommand = tool({
  name: "run_command",
  description: "Execute a shell command",
  parameters: { command: { type: "string" } },
  execute: safeclaw.guard("shell_exec", async (params) => {
    return await exec(params.command);
  }),
});

const agent = new Agent({
  name: "coding-assistant",
  tools: [writeFile, runCommand],
});

Step 4: Wrap Function Handlers (Assistants API)

const safeclaw = new SafeClaw({
  apiKey: process.env.SAFECLAW_API_KEY,
  policy: "./safeclaw.policy.yaml",
});

// Before executing any function call from the Assistants API:
async function handleFunctionCall(call) {
  const verdict = await safeclaw.evaluate({
    action: mapFunctionToAction(call.function.name),
    params: JSON.parse(call.function.arguments),
  });

if (verdict.effect === "deny") {
    return { error: Action denied: ${verdict.reason} };
  }

return await executeFunctionCall(call);
}

Step 5: Define Your Policy

version: 1 default: deny rules: - action: file_read path: "${PROJECT_DIR}/data/**" effect: allow - action: file_read path: "*/.env" effect: deny - action: file_write path: "${PROJECT_DIR}/output/**" effect: allow - action: file_write path: "${PROJECT_DIR}/src/**" effect: deny - action: shell_exec command: "python*" effect: allow - action: shell_exec command: "pip install*" effect: deny - action: shell_exec command: "rm*" effect: deny - action: network host: "api.openai.com" effect: allow - action: network host: "*.internal.company.com" effect: deny

- action: network host: "*" effect: deny

Step 6: Simulate Before Enforcing

npx @authensor/safeclaw simulate --policy safeclaw.policy.yaml

Review logged verdicts, refine rules, then enforce.

What Gets Blocked, What Gets Through

ALLOWED — Agent writes analysis output:

{ "action": "file_write", "path": "/project/output/report.csv", "verdict": "ALLOW" }

DENIED — Agent tries to modify source code:

{ "action": "file_write", "path": "/project/src/auth/login.ts", "verdict": "DENY", "reason": "path matches src/** deny rule" }

ALLOWED — Agent runs a Python script:

{ "action": "shell_exec", "command": "python analyze.py --input data.csv", "verdict": "ALLOW" }

DENIED — Agent installs an unknown package:

{ "action": "shell_exec", "command": "pip install obscure-package", "verdict": "DENY", "reason": "pip install* matches deny rule" }

DENIED — Agent calls internal API:

{ "action": "network", "host": "db.internal.company.com", "verdict": "DENY", "reason": "host matches *.internal.company.com deny rule" }

Without SafeClaw vs With SafeClaw

| Scenario | Without SafeClaw | With SafeClaw |
|---|---|---|
| Function call writes to /etc/ via misinterpreted path | File written to system directory | Blocked — path outside allowed output/** |
| Agent's tool handler calls internal microservice | Request sent with ambient credentials | Blocked — internal hosts denied by network rule |
| Agent generates report to output/ directory | Report written normally | Allowed — output/** is in write allowlist |
| Agent tries pip install in tool handler | Package installed, install scripts execute | Blocked — pip install* matches deny rule |
| Agent reads uploaded data file | Data read for analysis | Allowed — data/** is in read allowlist |

Every action evaluation is recorded in SafeClaw's tamper-proof audit trail (SHA-256 hash chain). The control plane receives only action metadata — never your OpenAI API key, function arguments, or file contents. SafeClaw runs with zero third-party dependencies, evaluates in sub-millisecond time, and is backed by 446 tests under TypeScript strict mode. The client is 100% open source, MIT licensed.

Cross-References

What is SafeClaw? — Core architecture of action-level gating
How to Safely Run LangChain Agents — LangChain also uses SDK wrapper integration
How to Safely Run CrewAI Agents — Multi-agent safety patterns for OpenAI-powered crews
SafeClaw Policy Reference — Full policy syntax and examples
How to Safely Run Autonomous Coding Agents — General autonomous agent safety

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw