2025-11-10 · Authensor

What Are Risk Signals in AI Agent Tool Calls?

Risk signals are observable indicators within AI agent tool call requests that suggest the requested action may be dangerous, unauthorized, or the result of adversarial manipulation. Risk signals include patterns such as accessing credential files, executing destructive shell commands, making requests to unknown external domains, or attempting operations outside the agent's designated workspace. SafeClaw by Authensor uses risk signals as the foundation for policy rule matching, enabling deny-by-default action gating that identifies and blocks risky tool calls before execution for agents built with Claude, OpenAI, or any supported framework.

Categories of Risk Signals

Risk signals in AI agent tool calls fall into several categories:

Path-Based Signals

The file path targeted by a read or write operation indicates risk:

| Signal | Example | Risk |
|--------|---------|------|
| Credential file access | .env, credentials.json | Secret exposure |
| System file access | /etc/passwd, /etc/shadow | System compromise |
| SSH key access | ~/.ssh/id_rsa | Authentication compromise |
| Path traversal | ../../etc/passwd | Workspace escape |
| Sensitive directory | ~/.aws/, ~/.gnupg/ | Credential theft |
| Binary file write | ./malware.exe, ./backdoor.sh | Malicious payload |

Command-Based Signals

The shell command requested indicates risk:

| Signal | Example | Risk |
|--------|---------|------|
| Recursive deletion | rm -rf /, find . -delete | Data destruction |
| Permission changes | chmod 777, chown root | Privilege escalation |
| Package installation | npm install , pip install | Supply chain attack |
| Network tools | curl, wget, nc | Data exfiltration |
| Git operations | git push --force, git reset --hard | History destruction |
| Process control | kill, pkill, nohup | Service disruption |

Network-Based Signals

The target of HTTP or API requests indicates risk:

| Signal | Example | Risk |
|--------|---------|------|
| Unknown external domain | https://unknown-site.com | Data exfiltration |
| Cloud metadata endpoints | http://169.254.169.254 | SSRF / credential theft |
| Internal network addresses | http://10.0.0.x, http://192.168.x.x | Lateral movement |
| Encoded payloads in URLs | ?data=base64encodedstring | Exfiltration attempt |

Behavioral Signals

Patterns in the sequence of tool calls indicate risk:

| Signal | Pattern | Risk |
|--------|---------|------|
| Read-then-send | File read followed by HTTP request | Data exfiltration |
| Privilege escalation | Write to sudoers, modify permissions | System compromise |
| Persistence | Write to crontab, startup scripts | Backdoor installation |
| Reconnaissance | Listing many directories, reading system files | Attack preparation |

Using Risk Signals in SafeClaw Policies

Install SafeClaw to enforce risk-signal-based policies:

npx @authensor/safeclaw

Policy rules directly encode risk signals as matching conditions:

# safeclaw.yaml
version: 1
defaultAction: deny

rules:
# PATH-BASED RISK SIGNALS
- action: file_read
path: "./.env*"
decision: deny
reason: "Risk signal: credential file access"

- action: file_read
path: "~/.ssh/**"
decision: deny
reason: "Risk signal: SSH key access"

# COMMAND-BASED RISK SIGNALS
- action: shell_execute
command: "rm -rf*"
decision: deny
reason: "Risk signal: recursive deletion"

- action: shell_execute
command: "curl*"
decision: deny
reason: "Risk signal: network tool execution"

- action: shell_execute
command: "chmod*"
decision: deny
reason: "Risk signal: permission modification"

# NETWORK-BASED RISK SIGNALS
- action: http_request
domain: "169.254.169.254"
decision: deny
reason: "Risk signal: cloud metadata SSRF"

- action: http_request
decision: deny
reason: "Risk signal: unapproved external request"

# SAFE OPERATIONS (no risk signals)
- action: file_read
path: "./src/**"
decision: allow

- action: shell_execute
command: "npm test"
decision: allow

Risk Signal Scoring

Advanced risk assessment assigns scores to individual signals and combines them:

SafeClaw's policy engine provides binary decisions (allow/deny/escalate) based on rule matching, which is deterministic and auditable. Teams can model risk scoring through rule ordering and decision types:

Risk Signals and Prompt Injection Detection

Risk signals are particularly valuable for detecting prompt injection attacks. When an agent that normally reads source files and runs tests suddenly attempts to read ~/.ssh/id_rsa and make an HTTP request to an unknown domain, the sequence of risk signals strongly suggests adversarial manipulation.

SafeClaw's hash-chained audit trail records all risk signals, including denied actions, creating a forensic record that enables:

SafeClaw's 446-test suite validates that risk signal patterns are correctly matched by policy rules across all action types, path patterns, and command formats.

Cross-References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw