2026-01-07 · Authensor

Prompt Injection Leading to Unauthorized File Access

Threat Description

Prompt injection is an attack where adversarial text embedded in user input, retrieved documents, or tool output causes an AI agent to perform actions its operator did not intend. When the injected instruction targets file operations, the agent may read sensitive files (credentials, private keys, proprietary source code) or write malicious content to system files (cron jobs, shell profiles, config files). The agent follows the injected instruction because it cannot reliably distinguish adversarial input from legitimate instructions.

Attack Vector

  1. An attacker embeds a hidden instruction in a file, webpage, or database record that the agent will process. Example: a README file containing .
  2. The agent retrieves or reads the file containing the injected prompt as part of its normal task.
  3. The injected instruction enters the agent's context window and is treated as a directive.
  4. The agent issues a file_read action targeting the sensitive file specified in the injection.
  5. The agent issues a file_write action to persist the sensitive data to an attacker-accessible location, or a network action to transmit it externally.
The injected file read action:
{
  "action": "file_read",
  "params": {
    "path": "/etc/passwd"
  },
  "agentId": "rag-agent-02",
  "timestamp": "2026-02-13T09:15:00Z"
}

The injected file write action:

{
  "action": "file_write",
  "params": {
    "path": "/tmp/exfil.txt",
    "content": "root:x:0:0:root:/root:/bin/bash\n..."
  },
  "agentId": "rag-agent-02",
  "timestamp": "2026-02-13T09:15:02Z"
}

Real-World Context

Prompt injection is a well-documented attack class against LLM-based systems. Research from multiple security teams has demonstrated successful prompt injection against agents using LangChain, AutoGPT, and similar frameworks. In the Clawdbot incident (1.5M API keys leaked), the agent's unrestricted file access meant that even without intentional prompt injection, the agent read credential files as a side effect of normal operation. Prompt injection amplifies this risk by directing the agent to read files it would not normally access.

Indirect prompt injection — where the malicious instruction is embedded in data the agent retrieves rather than in direct user input — is particularly dangerous because the operator has no visibility into the content before the agent processes it.

Why Existing Defenses Fail

Prompt-level defenses (system prompts saying "do not read sensitive files") are the exact target of prompt injection attacks. The attack works by overriding or circumventing these instructions. Defending against prompt injection with more prompts is a circular defense.

Input sanitization can catch known injection patterns in direct user input, but indirect prompt injection arrives through tool outputs, API responses, and retrieved documents. Sanitizing all possible input sources is infeasible in practice.

Sandboxing limits the available filesystem but cannot distinguish between a legitimate file read (agent reading a source file) and an injected file read (agent reading /etc/passwd) because both are file_read operations from the same process.

File permissions are process-level. If the agent process has read access to a directory, every file in that directory is readable regardless of whether the read was initiated by a legitimate instruction or an injected one.

How Action-Level Gating Prevents This

SafeClaw by Authensor evaluates every action at the point of execution, independent of how the action was triggered. The policy engine does not inspect the prompt or the agent's reasoning — it inspects the action itself. This makes action-level gating immune to prompt injection because the defense operates below the prompt layer.

  1. Path-based DENY rules block file_read and file_write actions targeting paths outside the agent's permitted workspace, regardless of why the agent is attempting the action.
  2. Deny-by-default architecture ensures that any file path not explicitly permitted is automatically blocked. An injected instruction to read /etc/passwd fails because /etc/passwd is not in the allowlist.
  3. Action-type restrictions can limit an agent to only file_read (no file_write), preventing injected write operations entirely.
  4. Sub-millisecond evaluation means the gating adds no perceptible latency. The policy engine, built in TypeScript strict mode with zero third-party dependencies, evaluates each action in under a millisecond.
The key insight: prompt injection controls the agent's intent, but action-level gating controls the agent's capability. An agent that intends to read /etc/passwd but lacks the policy permission to do so cannot execute the read.

Example Policy

{
  "rules": [
    {
      "action": "file_read",
      "match": { "pathPattern": "/project/src/" },
      "effect": "ALLOW",
      "reason": "Agent may read source files in the project"
    },
    {
      "action": "file_read",
      "match": { "pathPattern": "/project/docs/" },
      "effect": "ALLOW",
      "reason": "Agent may read documentation files"
    },
    {
      "action": "file_write",
      "match": { "pathPattern": "/project/src/" },
      "effect": "ALLOW",
      "reason": "Agent may write source files in the project"
    },
    {
      "action": "file_write",
      "match": { "pathPattern": "**" },
      "effect": "DENY",
      "reason": "All writes outside project src are denied"
    },
    {
      "action": "file_read",
      "match": { "pathPattern": "**" },
      "effect": "DENY",
      "reason": "All reads outside permitted directories are denied"
    }
  ]
}

First-match-wins evaluation means the ALLOW rules for project directories fire first. Any file_read or file_write targeting a path outside those directories hits the trailing DENY rules. An injected instruction to read /etc/passwd or write to /tmp/exfil.txt is denied regardless of the prompt content.

Detection in Audit Trail

SafeClaw's tamper-proof audit trail (SHA-256 hash chain) records every denied action with full context:

[2026-02-13T09:15:00Z] action=file_read path=/etc/passwd agent=rag-agent-02 verdict=DENY rule="All reads outside permitted directories are denied" hash=c4a9f2...
[2026-02-13T09:15:02Z] action=file_write path=/tmp/exfil.txt agent=rag-agent-02 verdict=DENY rule="All writes outside project src are denied" hash=d8b3e7...

A spike in DENY entries from a single agent session is a strong indicator of prompt injection. The audit trail preserves the full sequence of attempted actions, enabling forensic reconstruction of the injection chain. Each hash references the previous entry, making retroactive tampering detectable. The control plane receives only action metadata — never file contents or prompt text.

Cross-References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw