2025-12-05 · Authensor

Using SafeClaw for Data Pipeline Agents: Read-Source Write-Output Policy

Scenario

You run an AI agent that processes data from a source database, transforms it, and writes results to output tables. The source database contains sensitive data including PII tables that the agent should never access. The agent should read from source tables, write only to designated output tables, and never modify the source data. Every action must be logged for compliance audits.

SafeClaw enforces directional data flow: read-only access to approved source tables, write-only access to output tables, explicit denial of PII tables, and a tamper-proof audit trail for every operation.

Threat Model

A data pipeline agent without action-level gating can:

SafeClaw gates every action before it reaches the database driver or filesystem, ensuring the agent stays within its designated lane.

Recommended Policy

# Data Pipeline Agent Policy
policy:
  name: "data-pipeline-read-source-write-output"
  default: DENY

rules:
# --- File Read ---
- action: file_read
path: "/app/pipeline/config/**"
decision: ALLOW

- action: file_read
path: "/app/pipeline/sql/transforms/**"
decision: ALLOW

# --- File Write ---
- action: file_write
path: "/app/pipeline/output/**"
decision: ALLOW

- action: file_write
path: "/app/pipeline/logs/**"
decision: ALLOW

- action: file_write
path: "/app/pipeline/config/**"
decision: DENY

- action: file_write
path: "/app/pipeline/sql/**"
decision: DENY

# --- Shell Exec ---
- action: shell_exec
command: "python3 /app/pipeline/run_transform.py*"
decision: ALLOW

- action: shell_exec
command: "psqlSELECTFROM source.*"
decision: ALLOW

- action: shell_exec
command: "psqlSELECTFROM pii.*"
decision: DENY

- action: shell_exec
command: "psqlINSERT INTO output."
decision: ALLOW

- action: shell_exec
command: "psqlUPDATE"
decision: DENY

- action: shell_exec
command: "psqlDELETE"
decision: DENY

- action: shell_exec
command: "psqlDROP"
decision: DENY

- action: shell_exec
command: "psqlALTER"
decision: REQUIRE_APPROVAL

# --- Network ---
- action: network
domain: "source-db.internal:5432"
decision: ALLOW

- action: network
domain: "output-db.internal:5432"
decision: ALLOW

- action: network
domain: "api.openai.com"
decision: ALLOW

- action: network
domain: "*"
decision: DENY

Example Action Requests

1. Agent reads from a source table (ALLOW)

{
  "action": "shell_exec",
  "command": "psql -h source-db.internal -c \"SELECT id, amount, category FROM source.transactions WHERE date = '2026-02-13'\"",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:00:01Z"
}
// Decision: ALLOW — matches psqlSELECTFROM source.*

2. Agent writes to the output table (ALLOW)

{
  "action": "shell_exec",
  "command": "psql -h output-db.internal -c \"INSERT INTO output.daily_summary (date, total, count) VALUES ('2026-02-13', 48230.50, 1523)\"",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:01:00Z"
}
// Decision: ALLOW — matches psqlINSERT INTO output.

3. Agent attempts to read PII table (DENY)

{
  "action": "shell_exec",
  "command": "psql -h source-db.internal -c \"SELECT name, email, ssn FROM pii.customers\"",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:02:00Z"
}
// Decision: DENY — matches psqlSELECTFROM pii.* which is explicitly denied

4. Agent attempts a DELETE on source data (DENY)

{
  "action": "shell_exec",
  "command": "psql -h source-db.internal -c \"DELETE FROM source.transactions WHERE date < '2025-01-01'\"",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:03:00Z"
}
// Decision: DENY — psqlDELETE is explicitly denied

5. Agent attempts an ALTER TABLE (REQUIRE_APPROVAL)

{
  "action": "shell_exec",
  "command": "psql -h output-db.internal -c \"ALTER TABLE output.daily_summary ADD COLUMN region TEXT\"",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:04:00Z"
}
// Decision: REQUIRE_APPROVAL — schema changes require human sign-off

6. Agent attempts to exfiltrate data (DENY)

{
  "action": "network",
  "domain": "external-storage.example.com",
  "method": "POST",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:05:00Z"
}
// Decision: DENY — domain not in allowlist, wildcard catch-all denies

Setup Steps

  1. Install SafeClaw on the machine or container running your pipeline:
   npx @authensor/safeclaw
Use the browser-based setup wizard. Free tier with 7-day renewable keys, no credit card required.
  1. Select the "Data Pipeline" template from the wizard. This starts with directional flow rules: read from source, write to output.
  1. Map your database schemas. Define which schemas are source (read-only), output (write-only), and PII (fully denied). Use the exact schema names from your database.
  1. Deny destructive SQL operations. Explicitly deny UPDATE, DELETE, DROP, and TRUNCATE against all schemas. Use REQUIRE_APPROVAL for ALTER if your pipeline occasionally needs schema migrations.
  1. Lock network access to only the database hosts and your LLM provider. Deny all other domains to prevent data exfiltration.
  1. Integrate SafeClaw before every database call. In your pipeline code, evaluate each query before passing it to the database driver:
   from safeclaw import evaluate

def execute_query(query: str, host: str):
decision = evaluate({
"action": "shell_exec",
"command": f"psql -h {host} -c \"{query}\"",
"agent": "data-pipeline-agent"
})
if decision["result"] == "DENY":
raise PermissionError(f"SafeClaw denied query: {query}")
if decision["result"] == "REQUIRE_APPROVAL":
wait_for_approval(decision["approval_id"])
return db.execute(query)

  1. Enable audit logging from day one. SafeClaw's tamper-proof SHA-256 hash chain creates a compliance-ready record. Export logs in JSON or CSV for regulatory review.
  1. Run in simulation mode during development. Switch to enforcement mode before the pipeline processes production data.

Cross-References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw