How to Secure Your OpenAI GPT Agent
SafeClaw by Authensor gates every OpenAI tool_calls response before your application executes it, applying deny-by-default YAML policies that block unauthorized actions. Whether you're using GPT-4o, GPT-4.1, or o3, SafeClaw evaluates each function call against your policy file in sub-millisecond time and logs every decision to a hash-chained audit trail.
How OpenAI Tool Calling Works
OpenAI's Chat Completions API uses a tools parameter where you define available functions. When the model decides to call a function, it returns a tool_calls array in the assistant message, each containing a function name and JSON arguments. Your application then executes those functions and sends results back. The vulnerability is clear: between the model's decision and your execution, nothing validates whether that call should be allowed.
GPT Response → tool_calls[] → [SafeClaw Policy Check] → Execute or Deny
Quick Start
npx @authensor/safeclaw
Generates a safeclaw.yaml in your project root. SafeClaw maps directly to OpenAI's function call structure — the tool name matches the function name, and constraints apply to the arguments object.
Step 1: Define Policies for OpenAI Functions
# safeclaw.yaml
version: 1
default: deny
policies:
- name: "openai-database-access"
description: "Control database query functions"
actions:
- tool: "query_database"
effect: allow
constraints:
operation: "SELECT"
- tool: "query_database"
effect: deny
constraints:
operation: "DROP|DELETE|TRUNCATE"
- name: "openai-file-policy"
description: "Restrict file operations"
actions:
- tool: "write_file"
effect: allow
constraints:
path_pattern: "output/**"
- tool: "read_file"
effect: allow
constraints:
path_pattern: "data/**"
- name: "openai-api-calls"
description: "Control external API access"
actions:
- tool: "call_api"
effect: allow
constraints:
url_pattern: "https://api.internal.company.com/**"
- tool: "call_api"
effect: deny
Step 2: Integrate with OpenAI SDK
import OpenAI from "openai";
import { SafeClaw } from "@authensor/safeclaw";
const openai = new OpenAI();
const safeclaw = new SafeClaw("./safeclaw.yaml");
const response = await openai.chat.completions.create({
model: "gpt-4o",
tools: [
{
type: "function",
function: {
name: "query_database",
description: "Run a SQL query",
parameters: { type: "object", properties: { sql: { type: "string" } } },
},
},
],
messages: [{ role: "user", content: "Show me all users who signed up this week" }],
});
const message = response.choices[0].message;
if (message.tool_calls) {
for (const toolCall of message.tool_calls) {
const args = JSON.parse(toolCall.function.arguments);
const decision = safeclaw.evaluate(toolCall.function.name, args);
if (decision.allowed) {
const result = await executeTool(toolCall.function.name, args);
// Push tool result back to messages
} else {
console.log(Blocked: ${toolCall.function.name} — ${decision.reason});
}
}
}
Step 3: Handle Parallel Tool Calls
OpenAI frequently returns multiple tool_calls in a single response. SafeClaw evaluates each independently:
const toolResults = await Promise.all(
message.tool_calls.map(async (toolCall) => {
const args = JSON.parse(toolCall.function.arguments);
const decision = safeclaw.evaluate(toolCall.function.name, args);
return {
tool_call_id: toolCall.id,
role: "tool" as const,
content: decision.allowed
? JSON.stringify(await executeTool(toolCall.function.name, args))
: JSON.stringify({ error: Denied: ${decision.reason} }),
};
})
);
Step 4: Structured Output Safety
When using OpenAI's structured outputs with response_format, SafeClaw can also validate the schema of function arguments against expected types, catching malformed or injection-style arguments before execution.
policies:
- name: "argument-validation"
actions:
- tool: "update_user"
effect: allow
constraints:
required_fields: ["user_id", "field", "value"]
field_whitelist: ["name", "email", "preferences"]
Why SafeClaw
- 446 tests covering policy evaluation, edge cases, and audit integrity
- Deny-by-default — if a function isn't explicitly allowed, it's blocked
- Sub-millisecond evaluation — no perceptible latency in your OpenAI tool loop
- Hash-chained audit log — tamper-evident record of every function call evaluated
- Works with Claude AND OpenAI — same policy file, swap LLM providers freely
Related Pages
- How to Add Safety Gating to OpenAI Assistants API
- How to Secure Your Claude Agent with SafeClaw
- How to Secure Vercel AI SDK Tool Calls
- How to Add Safety Gating to LangChain Agents
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw