2025-11-17 · Authensor

How to Add Safety Rails to Mistral AI Agents

SafeClaw by Authensor gates every Mistral AI tool call through deny-by-default policies before your application executes it. Mistral's models — including Mistral Large, Codestral, and Mistral Small — support native function calling, and SafeClaw intercepts each tool_calls response to enforce your YAML-defined safety rules in sub-millisecond time.

How Mistral Tool Calling Works

Mistral's chat completion API accepts a tools parameter with function definitions. When the model decides to call a function, it returns a response with tool_calls containing the function name and arguments as a JSON string. Mistral also supports parallel tool calls, where multiple functions are invoked in a single response. The gap between the model's tool call decision and your execution is where SafeClaw enforces policy.

Mistral Response → tool_calls[] → [SafeClaw Policy Check] → Execute or Deny

Mistral's tool calling format is OpenAI-compatible, which means SafeClaw's evaluation works identically — no adapter or translation layer needed.

Quick Start

npx @authensor/safeclaw

Creates a safeclaw.yaml in your project root. Since Mistral uses the OpenAI-compatible format, your existing SafeClaw policies work without modification.

Step 1: Define Mistral-Specific Policies

# safeclaw.yaml
version: 1
default: deny

policies:
- name: "mistral-code-tools"
description: "Control Codestral code execution"
actions:
- tool: "execute_code"
effect: allow
constraints:
language: "python|javascript"
timeout_ms: 15000
- tool: "execute_code"
effect: deny
constraints:
language: "bash|shell"

- name: "mistral-retrieval"
description: "Allow controlled document retrieval"
actions:
- tool: "search_documents"
effect: allow
constraints:
index: "internal_docs"
- tool: "search_documents"
effect: deny
constraints:
index: "external_*"

- name: "mistral-data-ops"
description: "Control database operations"
actions:
- tool: "query_db"
effect: allow
constraints:
operation: "SELECT"
tables: "users|orders|products"
- tool: "mutate_db"
effect: deny

Step 2: Integrate with Mistral's SDK

import { Mistral } from "@mistralai/mistralai";
import { SafeClaw } from "@authensor/safeclaw";

const mistral = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });
const safeclaw = new SafeClaw("./safeclaw.yaml");

const response = await mistral.chat.complete({
model: "mistral-large-latest",
tools: [
{
type: "function",
function: {
name: "query_db",
description: "Query the database",
parameters: {
type: "object",
properties: { sql: { type: "string" } },
},
},
},
],
messages: [{ role: "user", content: "How many orders shipped this week?" }],
});

const message = response.choices[0].message;

if (message.toolCalls) {
for (const toolCall of message.toolCalls) {
const args = JSON.parse(toolCall.function.arguments);
const decision = safeclaw.evaluate(toolCall.function.name, args);

if (decision.allowed) {
const result = await executeTool(toolCall.function.name, args);
// Return tool result to continue conversation
} else {
console.log(Blocked: ${toolCall.function.name} — ${decision.reason});
}
}
}

Step 3: Handle Mistral's Parallel Function Calls

Mistral models frequently return multiple tool calls in a single response. SafeClaw evaluates each independently, and the audit log records the full parallel batch:

const toolMessages = [];

for (const toolCall of message.toolCalls) {
const args = JSON.parse(toolCall.function.arguments);
const decision = safeclaw.evaluate(toolCall.function.name, args);

toolMessages.push({
role: "tool",
name: toolCall.function.name,
content: decision.allowed
? JSON.stringify(await executeTool(toolCall.function.name, args))
: JSON.stringify({ error: Denied: ${decision.reason} }),
tool_call_id: toolCall.id,
});
}

// Continue conversation with all tool results
const followUp = await mistral.chat.complete({
model: "mistral-large-latest",
tools: tools,
messages: [...messages, message, ...toolMessages],
});

Step 4: Codestral-Specific Code Safety

When using Codestral for code generation and execution, add targeted policies for code safety:

policies:
  - name: "codestral-safety"
    description: "Safety rails for Codestral code generation"
    actions:
      - tool: "run_generated_code"
        effect: allow
        constraints:
          sandbox: true
          max_memory_mb: 256
          blocked_modules: ["os", "subprocess", "socket", "ctypes"]
      - tool: "write_generated_file"
        effect: allow
        constraints:
          path_pattern: "generated/**"
          max_size_bytes: 100000

Why SafeClaw

Related Pages

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw