2026-01-26 · Authensor

How to Set Up Alerts for Dangerous AI Agent Actions

Blocking dangerous AI agent actions is necessary but not sufficient — you also need to know when an agent attempts something dangerous so you can investigate and respond. SafeClaw by Authensor logs every policy evaluation in a hash-chained audit trail and supports webhook-based alerting, letting you send real-time notifications to Slack, PagerDuty, or any incident management system when critical deny rules fire. You learn about dangerous agent behavior within seconds, not during the next log review.

Quick Start

npx @authensor/safeclaw

Scaffolds a .safeclaw/ directory. Configure alerting as described below.

Step 1: Configure Alert Webhooks

SafeClaw can send webhook notifications when specific policy rules fire:

# .safeclaw/config.yaml alerts: enabled: true webhooks: - name: slack-security url: "${SLACK_WEBHOOK_URL}" events: - deny - escalation filter: severity: [critical, high]

- name: pagerduty url: "${PAGERDUTY_EVENTS_URL}" events: - deny filter: severity: [critical]

Step 2: Assign Severity to Policy Rules

Tag your policy rules with severity levels so the alerting system knows what to escalate:

# .safeclaw/policies/coding-assistant.yaml rules: - id: block-env-read action: file.read effect: deny severity: critical conditions: path: pattern: ".env*" reason: "Credential file access attempt" - id: block-destructive-shell action: shell.execute effect: deny severity: critical conditions: command: pattern: "{rm -rf,sudo,mkfs,dd if=}" reason: "Destructive system command attempt" - id: block-external-network action: network.request effect: deny severity: high conditions: destination: not_pattern: "*.internal.company.com" reason: "External network access attempt"

- id: block-config-writes action: file.write effect: deny severity: medium conditions: path: pattern: "{.config.,.json,.yaml,*.yml}" reason: "Configuration file write attempt"

Step 3: Configure Slack Alert Format

Customize the alert payload for your Slack channel:

alerts:
  webhooks:
    - name: slack-security
      url: "${SLACK_WEBHOOK_URL}"
      format:
        template: |
          {
            "blocks": [
              {
                "type": "header",
                "text": {
                  "type": "plain_text",
                  "text": "🚨 SafeClaw Alert: {{severity}} — {{action}} DENIED"
                }
              },
              {
                "type": "section",
                "fields": [
                  { "type": "mrkdwn", "text": "Agent: {{agentId}}" },
                  { "type": "mrkdwn", "text": "Rule: {{matchedRule}}" },
                  { "type": "mrkdwn", "text": "Action: {{action}}" },
                  { "type": "mrkdwn", "text": "Reason: {{reason}}" }
                ]
              },
              {
                "type": "section",
                "text": {
                  "type": "mrkdwn",
                  "text": "Details:\n

{{requestDetails}}``

"
                }
              }
            ]
          }

Step 4: Set Up Anomaly-Based Alerts

Beyond rule-specific alerts, detect anomalous behavior patterns:
yaml
alerts:
  anomalyDetection:
    enabled: true
    rules:
      - name: denial-spike
        condition: "deny_count_5m > 3 * deny_count_5m_avg_24h"
        severity: high
        message: "Denial rate is 3x the 24-hour average"

- name: new-action-type
        condition: "action_type NOT IN historical_action_types"
        severity: critical
        message: "Agent attempting previously unseen action type"

- name: rapid-retry
        condition: "same_denied_action_count_1m > 5"
        severity: critical
        message: "Agent retrying denied action rapidly — possible prompt injection"

- name: off-hours-activity
        condition: "hour < 6 OR hour > 22"
        severity: high
        message: "Agent active during off-hours"

Step 5: Create an Alert Runbook


Document how to respond to each alert type:
yaml
.safeclaw/runbooks/critical-alerts.yaml

runbooks:
  - alert: "block-env-read"
    steps:
      - "Check audit trail for the session: npx @authensor/safeclaw audit show --session "
      - "Review the agent's recent actions for prompt injection indicators"
      - "If malicious: terminate the agent session immediately"
      - "Rotate any credentials the agent may have been attempting to access"

- alert: "rapid-retry"
    steps:
      - "Immediately terminate the agent session"
      - "Review the full session audit trail"
      - "Check for prompt injection in recent inputs"
      - "Report to security team for investigation"

Step 6: Test Your Alerts


Verify alerts fire correctly using SafeClaw's simulation mode:
bash
npx @authensor/safeclaw simulate --action file.read --path ".env" --test-alerts

This triggers the policy evaluation and sends a test alert to your configured webhooks, confirming the full alert pipeline works end-to-end.

Why SafeClaw

446 tests ensuring alerting logic and webhook delivery are reliable
Deny-by-default — dangerous actions are blocked AND alerted simultaneously
Sub-millisecond evaluation — alerts fire within milliseconds of the action attempt
Hash-chained audit trail — alert evidence is tamper-proof for incident investigation
Works with Claude AND OpenAI — one alerting configuration covers all AI providers

Cross-References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw