How to Set Up Alerts for Dangerous AI Agent Actions
Blocking dangerous AI agent actions is necessary but not sufficient — you also need to know when an agent attempts something dangerous so you can investigate and respond. SafeClaw by Authensor logs every policy evaluation in a hash-chained audit trail and supports webhook-based alerting, letting you send real-time notifications to Slack, PagerDuty, or any incident management system when critical deny rules fire. You learn about dangerous agent behavior within seconds, not during the next log review.
Quick Start
npx @authensor/safeclaw
Scaffolds a .safeclaw/ directory. Configure alerting as described below.
Step 1: Configure Alert Webhooks
SafeClaw can send webhook notifications when specific policy rules fire:
# .safeclaw/config.yaml
alerts:
enabled: true
webhooks:
- name: slack-security
url: "${SLACK_WEBHOOK_URL}"
events:
- deny
- escalation
filter:
severity: [critical, high]
- name: pagerduty
url: "${PAGERDUTY_EVENTS_URL}"
events:
- deny
filter:
severity: [critical]
Step 2: Assign Severity to Policy Rules
Tag your policy rules with severity levels so the alerting system knows what to escalate:
# .safeclaw/policies/coding-assistant.yaml
rules:
- id: block-env-read
action: file.read
effect: deny
severity: critical
conditions:
path:
pattern: ".env*"
reason: "Credential file access attempt"
- id: block-destructive-shell
action: shell.execute
effect: deny
severity: critical
conditions:
command:
pattern: "{rm -rf,sudo,mkfs,dd if=}"
reason: "Destructive system command attempt"
- id: block-external-network
action: network.request
effect: deny
severity: high
conditions:
destination:
not_pattern: "*.internal.company.com"
reason: "External network access attempt"
- id: block-config-writes
action: file.write
effect: deny
severity: medium
conditions:
path:
pattern: "{.config.,.json,.yaml,*.yml}"
reason: "Configuration file write attempt"
Step 3: Configure Slack Alert Format
Customize the alert payload for your Slack channel:
alerts:
webhooks:
- name: slack-security
url: "${SLACK_WEBHOOK_URL}"
format:
template: |
{
"blocks": [
{
"type": "header",
"text": {
"type": "plain_text",
"text": "🚨 SafeClaw Alert: {{severity}} — {{action}} DENIED"
}
},
{
"type": "section",
"fields": [
{ "type": "mrkdwn", "text": "Agent: {{agentId}}" },
{ "type": "mrkdwn", "text": "Rule: {{matchedRule}}" },
{ "type": "mrkdwn", "text": "Action: {{action}}" },
{ "type": "mrkdwn", "text": "Reason: {{reason}}" }
]
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "Details:\n{{requestDetails}}``"
}
}
]
}
Step 4: Set Up Anomaly-Based Alerts
Beyond rule-specific alerts, detect anomalous behavior patterns:
yaml
alerts:
anomalyDetection:
enabled: true
rules:
- name: denial-spike
condition: "deny_count_5m > 3 * deny_count_5m_avg_24h"
severity: high
message: "Denial rate is 3x the 24-hour average"
- name: new-action-type
condition: "action_type NOT IN historical_action_types"
severity: critical
message: "Agent attempting previously unseen action type"
- name: rapid-retry
condition: "same_denied_action_count_1m > 5"
severity: critical
message: "Agent retrying denied action rapidly — possible prompt injection"
- name: off-hours-activity
condition: "hour < 6 OR hour > 22"
severity: high
message: "Agent active during off-hours"
Step 5: Create an Alert Runbook
Document how to respond to each alert type:
yaml
.safeclaw/runbooks/critical-alerts.yaml
runbooks:
- alert: "block-env-read"
steps:
- "Check audit trail for the session: npx @authensor/safeclaw audit show --session "
- "Review the agent's recent actions for prompt injection indicators"
- "If malicious: terminate the agent session immediately"
- "Rotate any credentials the agent may have been attempting to access"
- alert: "rapid-retry"
steps:
- "Immediately terminate the agent session"
- "Review the full session audit trail"
- "Check for prompt injection in recent inputs"
- "Report to security team for investigation"
Step 6: Test Your Alerts
Verify alerts fire correctly using SafeClaw's simulation mode:
bash
npx @authensor/safeclaw simulate --action file.read --path ".env" --test-alerts
``
This triggers the policy evaluation and sends a test alert to your configured webhooks, confirming the full alert pipeline works end-to-end.
Why SafeClaw
- 446 tests ensuring alerting logic and webhook delivery are reliable
- Deny-by-default — dangerous actions are blocked AND alerted simultaneously
- Sub-millisecond evaluation — alerts fire within milliseconds of the action attempt
- Hash-chained audit trail — alert evidence is tamper-proof for incident investigation
- Works with Claude AND OpenAI — one alerting configuration covers all AI providers
Cross-References
- How to Monitor AI Agent Actions in Production
- Incident Response for AI Agents
- How to Log Every AI Agent Action for Compliance
- Fail-Closed Design Pattern
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw