Simulation Mode
Simulation mode is an operational state in which a policy engine evaluates every action and logs its verdict — "would allow" or "would deny" — without actually enforcing the decision, allowing administrators to observe policy behavior before committing to enforcement.
In Detail
Deploying a security policy for AI agents carries a practical risk: if the policy is too restrictive, it blocks legitimate agent actions and disrupts workflows. If it is too permissive, it fails to prevent harmful actions. Simulation mode — sometimes called dry-run mode — addresses this problem by decoupling evaluation from enforcement.
In simulation mode, the policy engine operates identically to enforcement mode. Every action the AI agent attempts is intercepted, matched against the policy rules, and assigned a verdict. The difference is that the verdict is not applied. A deny verdict does not block the action; an allow verdict does not gate it differently. Instead, every verdict is logged with a label indicating what would have happened under enforcement.
The Policy Tuning Workflow
Simulation mode enables a structured workflow for developing and refining policies:
- Draft. The administrator writes an initial set of policy rules based on their understanding of the agent's expected behavior.
- Simulate. The agent operates with the draft policy in simulation mode. All actions proceed normally, but every policy evaluation is logged.
- Analyze. The administrator reviews the simulation log. They identify actions that were "would deny" but should have been permitted (false positives) and actions that were "would allow" but should have been blocked (false negatives).
- Refine. The administrator adjusts the policy rules — adding permissions for legitimate actions, tightening rules for risky ones, reordering rules in the first-match-wins sequence.
- Re-simulate. The revised policy is tested in another simulation cycle. Steps 3 through 5 repeat until the administrator is confident the policy is correctly calibrated.
- Enforce. The administrator switches from simulation mode to enforcement mode. The policy now actively gates agent actions.
Logging in Simulation Mode
Simulation mode logs must clearly distinguish simulated decisions from enforced ones. Each log entry typically includes:
- The action that was evaluated (type, target, arguments)
- The rule that matched (or the indication that no rule matched)
- The verdict that would have been applied (allow, deny, require_approval)
- A flag indicating the mode was simulation
Examples
- An administrator deploys a new policy that permits
file_readon./src/**and denies all other actions. In simulation mode, the agent reads./src/index.ts(logged as "would allow") and also reads./package.json(logged as "would deny"). The administrator realizes the policy needs a rule permitting reads on project root configuration files.
- A security team is evaluating whether to restrict
shell_execactions to a specific set of commands. They enable simulation mode and observe a week of agent activity. The simulation log reveals that agents frequently runprettier --write, which was not in the initial allowlist. The team adds it before switching to enforcement.
- An administrator wants to understand the impact of changing rule order in a first-match-wins policy. They create two policy variants, run each in simulation mode, and compare the logs to see how different orderings affect verdicts.
Related Concepts
- Policy Engine — The component that evaluates actions in both simulation and enforcement modes.
- Action-Level Gating — The enforcement mechanism that simulation mode temporarily disengages.
- First-Match-Wins — The evaluation model whose behavior simulation mode helps administrators understand.
- Deny-by-Default — The default posture whose impact simulation mode reveals before enforcement.
- Tamper-Proof Audit Trail — The log that records simulation verdicts alongside enforced ones.
In SafeClaw
SafeClaw, by Authensor, includes simulation mode as a built-in operational state. When simulation mode is enabled, SafeClaw's policy engine evaluates every agent action — file_write, file_read, shell_exec, network — and logs the verdict without enforcing it. Actions proceed regardless of the policy decision, but the full evaluation is recorded in SafeClaw's tamper-proof audit trail.
Administrators can use the SafeClaw browser dashboard to review simulation logs, identify policy gaps, and iteratively refine rules. Once satisfied, they switch to enforcement mode through the dashboard or configuration, and SafeClaw begins actively gating actions.
This workflow is particularly valuable during initial SafeClaw deployment. Administrators can install SafeClaw via npx @authensor/safeclaw, configure an initial policy using the setup wizard, and run in simulation mode to observe how the policy interacts with their agents (Claude, OpenAI, or LangChain) before any legitimate actions are blocked. SafeClaw evaluates policies locally with sub-millisecond latency, so simulation mode introduces no meaningful performance overhead. The free tier is available with 7-day renewable keys and no credit card required.
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw