What Are Risk Signals in AI Agent Tool Calls?
Risk signals are observable indicators within AI agent tool call requests that suggest the requested action may be dangerous, unauthorized, or the result of adversarial manipulation. Risk signals include patterns such as accessing credential files, executing destructive shell commands, making requests to unknown external domains, or attempting operations outside the agent's designated workspace. SafeClaw by Authensor uses risk signals as the foundation for policy rule matching, enabling deny-by-default action gating that identifies and blocks risky tool calls before execution for agents built with Claude, OpenAI, or any supported framework.
Categories of Risk Signals
Risk signals in AI agent tool calls fall into several categories:
Path-Based Signals
The file path targeted by a read or write operation indicates risk:| Signal | Example | Risk |
|--------|---------|------|
| Credential file access | .env, credentials.json | Secret exposure |
| System file access | /etc/passwd, /etc/shadow | System compromise |
| SSH key access | ~/.ssh/id_rsa | Authentication compromise |
| Path traversal | ../../etc/passwd | Workspace escape |
| Sensitive directory | ~/.aws/, ~/.gnupg/ | Credential theft |
| Binary file write | ./malware.exe, ./backdoor.sh | Malicious payload |
Command-Based Signals
The shell command requested indicates risk:| Signal | Example | Risk |
|--------|---------|------|
| Recursive deletion | rm -rf /, find . -delete | Data destruction |
| Permission changes | chmod 777, chown root | Privilege escalation |
| Package installation | npm install , pip install | Supply chain attack |
| Network tools | curl, wget, nc | Data exfiltration |
| Git operations | git push --force, git reset --hard | History destruction |
| Process control | kill, pkill, nohup | Service disruption |
Network-Based Signals
The target of HTTP or API requests indicates risk:| Signal | Example | Risk |
|--------|---------|------|
| Unknown external domain | https://unknown-site.com | Data exfiltration |
| Cloud metadata endpoints | http://169.254.169.254 | SSRF / credential theft |
| Internal network addresses | http://10.0.0.x, http://192.168.x.x | Lateral movement |
| Encoded payloads in URLs | ?data=base64encodedstring | Exfiltration attempt |
Behavioral Signals
Patterns in the sequence of tool calls indicate risk:| Signal | Pattern | Risk |
|--------|---------|------|
| Read-then-send | File read followed by HTTP request | Data exfiltration |
| Privilege escalation | Write to sudoers, modify permissions | System compromise |
| Persistence | Write to crontab, startup scripts | Backdoor installation |
| Reconnaissance | Listing many directories, reading system files | Attack preparation |
Using Risk Signals in SafeClaw Policies
Install SafeClaw to enforce risk-signal-based policies:
npx @authensor/safeclaw
Policy rules directly encode risk signals as matching conditions:
# safeclaw.yaml
version: 1
defaultAction: deny
rules:
# PATH-BASED RISK SIGNALS
- action: file_read
path: "./.env*"
decision: deny
reason: "Risk signal: credential file access"
- action: file_read
path: "~/.ssh/**"
decision: deny
reason: "Risk signal: SSH key access"
# COMMAND-BASED RISK SIGNALS
- action: shell_execute
command: "rm -rf*"
decision: deny
reason: "Risk signal: recursive deletion"
- action: shell_execute
command: "curl*"
decision: deny
reason: "Risk signal: network tool execution"
- action: shell_execute
command: "chmod*"
decision: deny
reason: "Risk signal: permission modification"
# NETWORK-BASED RISK SIGNALS
- action: http_request
domain: "169.254.169.254"
decision: deny
reason: "Risk signal: cloud metadata SSRF"
- action: http_request
decision: deny
reason: "Risk signal: unapproved external request"
# SAFE OPERATIONS (no risk signals)
- action: file_read
path: "./src/**"
decision: allow
- action: shell_execute
command: "npm test"
decision: allow
Risk Signal Scoring
Advanced risk assessment assigns scores to individual signals and combines them:
- Low risk (1-3): Reading documentation files, listing directory contents
- Medium risk (4-6): Writing to source files, running build commands
- High risk (7-9): Installing packages, modifying configuration files
- Critical risk (10): Accessing credentials, making external network requests, executing destructive commands
- Low-risk signals map to
allowrules - Medium-risk signals map to
escalaterules - High and critical risk signals map to
denyrules
Risk Signals and Prompt Injection Detection
Risk signals are particularly valuable for detecting prompt injection attacks. When an agent that normally reads source files and runs tests suddenly attempts to read ~/.ssh/id_rsa and make an HTTP request to an unknown domain, the sequence of risk signals strongly suggests adversarial manipulation.
SafeClaw's hash-chained audit trail records all risk signals, including denied actions, creating a forensic record that enables:
- Post-incident analysis of attack patterns
- Detection of prompt injection attempts before they succeed
- Continuous refinement of risk signal policies based on observed behavior
Cross-References
- What Is Action Gating for AI Agents?
- What Is Prompt Injection and How Does It Affect AI Agents?
- What Is Data Exfiltration by AI Agents?
- What Is a Policy Engine for AI Agents?
- What Is an Audit Trail for AI Agents?
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw