How to Add Safety Rails to Mistral AI Agents
SafeClaw by Authensor gates every Mistral AI tool call through deny-by-default policies before your application executes it. Mistral's models — including Mistral Large, Codestral, and Mistral Small — support native function calling, and SafeClaw intercepts each tool_calls response to enforce your YAML-defined safety rules in sub-millisecond time.
How Mistral Tool Calling Works
Mistral's chat completion API accepts a tools parameter with function definitions. When the model decides to call a function, it returns a response with tool_calls containing the function name and arguments as a JSON string. Mistral also supports parallel tool calls, where multiple functions are invoked in a single response. The gap between the model's tool call decision and your execution is where SafeClaw enforces policy.
Mistral Response → tool_calls[] → [SafeClaw Policy Check] → Execute or Deny
Mistral's tool calling format is OpenAI-compatible, which means SafeClaw's evaluation works identically — no adapter or translation layer needed.
Quick Start
npx @authensor/safeclaw
Creates a safeclaw.yaml in your project root. Since Mistral uses the OpenAI-compatible format, your existing SafeClaw policies work without modification.
Step 1: Define Mistral-Specific Policies
# safeclaw.yaml
version: 1
default: deny
policies:
- name: "mistral-code-tools"
description: "Control Codestral code execution"
actions:
- tool: "execute_code"
effect: allow
constraints:
language: "python|javascript"
timeout_ms: 15000
- tool: "execute_code"
effect: deny
constraints:
language: "bash|shell"
- name: "mistral-retrieval"
description: "Allow controlled document retrieval"
actions:
- tool: "search_documents"
effect: allow
constraints:
index: "internal_docs"
- tool: "search_documents"
effect: deny
constraints:
index: "external_*"
- name: "mistral-data-ops"
description: "Control database operations"
actions:
- tool: "query_db"
effect: allow
constraints:
operation: "SELECT"
tables: "users|orders|products"
- tool: "mutate_db"
effect: deny
Step 2: Integrate with Mistral's SDK
import { Mistral } from "@mistralai/mistralai";
import { SafeClaw } from "@authensor/safeclaw";
const mistral = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });
const safeclaw = new SafeClaw("./safeclaw.yaml");
const response = await mistral.chat.complete({
model: "mistral-large-latest",
tools: [
{
type: "function",
function: {
name: "query_db",
description: "Query the database",
parameters: {
type: "object",
properties: { sql: { type: "string" } },
},
},
},
],
messages: [{ role: "user", content: "How many orders shipped this week?" }],
});
const message = response.choices[0].message;
if (message.toolCalls) {
for (const toolCall of message.toolCalls) {
const args = JSON.parse(toolCall.function.arguments);
const decision = safeclaw.evaluate(toolCall.function.name, args);
if (decision.allowed) {
const result = await executeTool(toolCall.function.name, args);
// Return tool result to continue conversation
} else {
console.log(Blocked: ${toolCall.function.name} — ${decision.reason});
}
}
}
Step 3: Handle Mistral's Parallel Function Calls
Mistral models frequently return multiple tool calls in a single response. SafeClaw evaluates each independently, and the audit log records the full parallel batch:
const toolMessages = [];
for (const toolCall of message.toolCalls) {
const args = JSON.parse(toolCall.function.arguments);
const decision = safeclaw.evaluate(toolCall.function.name, args);
toolMessages.push({
role: "tool",
name: toolCall.function.name,
content: decision.allowed
? JSON.stringify(await executeTool(toolCall.function.name, args))
: JSON.stringify({ error: Denied: ${decision.reason} }),
tool_call_id: toolCall.id,
});
}
// Continue conversation with all tool results
const followUp = await mistral.chat.complete({
model: "mistral-large-latest",
tools: tools,
messages: [...messages, message, ...toolMessages],
});
Step 4: Codestral-Specific Code Safety
When using Codestral for code generation and execution, add targeted policies for code safety:
policies:
- name: "codestral-safety"
description: "Safety rails for Codestral code generation"
actions:
- tool: "run_generated_code"
effect: allow
constraints:
sandbox: true
max_memory_mb: 256
blocked_modules: ["os", "subprocess", "socket", "ctypes"]
- tool: "write_generated_file"
effect: allow
constraints:
path_pattern: "generated/**"
max_size_bytes: 100000
Why SafeClaw
- 446 tests covering policy evaluation, edge cases, and audit integrity
- Deny-by-default — unregistered tools are automatically blocked
- Sub-millisecond evaluation — no perceptible latency added to Mistral's response pipeline
- Hash-chained audit log — cryptographically linked records of every tool call
- Works with Claude AND OpenAI — and Mistral, since it uses the OpenAI-compatible format
Related Pages
- How to Secure Your OpenAI GPT Agent
- How to Secure Llama-Based AI Agents
- How to Add Safety Gating to LangChain Agents
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw