AI Agent Safety Maturity Model: Where Is Your Organization?
The AI Agent Safety Maturity Model is a five-level framework for assessing how well your organization controls the actions of autonomous AI systems. Most organizations in 2026 are at Level 1 or Level 2 — they have AI agents running with minimal or no safety controls. This model helps you identify where you are, understand the risks at your current level, and plan a concrete path to the level you need.
How to Use This Model
Read through the five levels below. Identify which level most closely matches your current state. Then look at the criteria for the next level up — those criteria become your action items. The goal is not necessarily to reach Level 5 immediately. The goal is to move up from wherever you are, because each level represents a meaningful reduction in risk.
Level 1: Uncontrolled
Description
AI agents are running with no safety controls. Agents have unrestricted access to files, commands, and network. No one has a complete inventory of which agents are running or what they can do.Characteristics
- Agents have default OS-level permissions (read/write to any file the user can access)
- No restrictions on shell command execution
- No restrictions on outbound network requests
- No audit trail of agent actions
- No policy documentation
- No assigned ownership of agent safety
Risk Profile
Maximum exposure. Every risk documented in the AI agent risk literature is active: credential exfiltration, file deletion, supply chain contamination, lateral movement, and compliance failure. The Clawdbot incident (1.5 million leaked API keys) occurred at Level 1.What to Do
Install SafeClaw in simulation mode:npx @authensor/safeclaw. This immediately gives you visibility into what your agents are doing without blocking anything. You move to Level 2 by knowing what is happening.
Level 2: Visible
Description
The organization knows what AI agents are running and has basic visibility into their actions, but has not yet implemented enforcement controls.Characteristics
- Inventory of AI agents and tools exists (which agents, which frameworks, which teams)
- Agents are monitored through logs or simulation mode
- Agent actions are recorded but not restricted
- Some ad-hoc guidelines exist ("don't give agents access to production")
- No formal policy enforcement
- No tamper-proof audit trail
Risk Profile
Reduced from Level 1 because the organization can detect problems after they happen. However, detection is reactive — damage occurs before it is identified. Monitoring without enforcement is a notification system, not a prevention system.What to Do
Define deny-by-default policies for each agent based on the action data you have collected in simulation mode. Move your highest-risk agents (those with shell_exec or network access) to enforcement mode first. You move to Level 3 when policies are actively enforced.Level 3: Controlled
Description
Action-level gating is in place for critical agents. Deny-by-default policies are enforced. A tamper-proof audit trail records all decisions.Characteristics
- Action-level gating enforced on agents with shell_exec, network, and sensitive file access
- Deny-by-default policies defined per agent or per team
- Tamper-proof audit trail using SHA-256 hash chains
- Simulation mode used for policy testing before enforcement
- Policy ownership assigned to specific individuals
- Incident response plan includes agent-related scenarios
Risk Profile
Significantly reduced. The most dangerous actions — unauthorized shell commands, network exfiltration, credential access — are blocked at the policy layer. Prompt injection attacks are neutralized because action-level gating evaluates actions, not instructions. The remaining risk is in agents or action types not yet covered by policies.What to Do
Extend gating to all agents, including lower-risk ones (file_read, file_write within project directories). Begin quarterly policy reviews. Integrate audit trail data into your compliance reporting. You move to Level 4 when all agents are governed and policies are regularly reviewed.SafeClaw at Level 3
SafeClaw provides everything needed for Level 3: deny-by-default policies, sub-millisecond action-level gating, SHA-256 tamper-proof audit trail, and simulation mode for safe policy testing. The browser dashboard at safeclaw.onrender.com gives teams a centralized view of all agent activity and policy decisions.Level 4: Governed
Description
All AI agents are under policy control. Policies are reviewed regularly. Audit data is integrated into organizational compliance and governance processes.Characteristics
- Every AI agent has an enforced deny-by-default policy
- All four action types (file_write, file_read, shell_exec, network) are governed for every agent
- Quarterly policy reviews with documented changes
- Audit trail data feeds into SOC 2, HIPAA, GDPR, or other compliance frameworks
- New agent deployments require safety review and policy definition before launch
- Human-in-the-loop workflows defined for high-risk actions
- Cross-team visibility through shared dashboards
- Agent safety included in employee onboarding and training
Risk Profile
Low. Residual risk comes from policy gaps (actions that are allowed but should not be) and novel attack vectors not yet anticipated. These are addressed through regular reviews and adversarial testing.What to Do
Implement automated policy testing — regularly run adversarial scenarios against your policies to verify they hold. Integrate agent safety metrics into your security KPIs. Begin cross-organizational benchmarking. You move to Level 5 when safety testing is automated and continuous.Level 5: Continuously Validated
Description
Agent safety controls are not just in place — they are continuously tested, validated, and improved. The organization treats agent safety as a living system, not a static configuration.Characteristics
- Automated adversarial testing runs against policies on a scheduled basis
- Policy-as-code: safety policies are version-controlled and reviewed like application code
- Automated alerts for policy anomalies (unusual patterns of denied actions, new action types)
- Red team exercises include AI agent attack scenarios
- Audit trail analytics identify trends and predict policy gaps
- Agent safety metrics reported to leadership alongside other security metrics
- Zero-trust architecture applied to agent access: continuous verification, not one-time authorization
- Safety controls tested in CI/CD before agent deployments reach production
- Cross-organizational contribution to agent safety standards and best practices
Risk Profile
Minimal. Residual risk is limited to truly novel attack vectors. The organization's continuous validation process means that new risks are identified and mitigated quickly. This level represents the current state of the art in AI agent governance.Maturity Assessment Quick Reference
| Criteria | L1 | L2 | L3 | L4 | L5 |
|---|---|---|---|---|---|
| Agent inventory | No | Yes | Yes | Yes | Yes |
| Action visibility | No | Yes | Yes | Yes | Yes |
| Policy enforcement | No | No | Critical agents | All agents | All agents |
| Deny-by-default | No | No | Yes | Yes | Yes |
| Tamper-proof audit | No | No | Yes | Yes | Yes |
| Policy reviews | No | No | Ad-hoc | Quarterly | Continuous |
| Compliance integration | No | No | No | Yes | Yes |
| Adversarial testing | No | No | No | Manual | Automated |
| Policy-as-code | No | No | No | No | Yes |
| Continuous validation | No | No | No | No | Yes |
The Most Common Gap: Level 1 to Level 3
Most organizations reading this are at Level 1 (uncontrolled) or Level 2 (visible). The single most impactful move is to jump from Level 1 directly to Level 3 (controlled). This is possible because modern tooling makes it fast.
With SafeClaw:
- Install —
npx @authensor/safeclaw(60 seconds) - Simulate — Run in simulation mode for 24-48 hours to see what your agents do
- Define policies — Use the dashboard at safeclaw.onrender.com to create deny-by-default policies based on observed behavior
- Enforce — Switch to enforcement mode
- Audit — Review the tamper-proof audit trail
Where Should Your Organization Be?
The answer depends on your risk profile:
- Individual developers or small teams with no sensitive data access: Level 2 (visible) is a reasonable minimum. Know what your agents are doing.
- Teams with access to credentials, production systems, or customer data: Level 3 (controlled) is the minimum. Action-level gating is mandatory.
- Organizations in regulated industries (finance, healthcare, government): Level 4 (governed) should be the target, with a roadmap to Level 5.
- Organizations building and selling AI agent products: Level 5 (continuously validated) is the standard your customers will expect.
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw