2025-12-05 · Authensor

Using SafeClaw for Data Pipeline Agents: Read-Source Write-Output Policy

Scenario

You run an AI agent that processes data from a source database, transforms it, and writes results to output tables. The source database contains sensitive data including PII tables that the agent should never access. The agent should read from source tables, write only to designated output tables, and never modify the source data. Every action must be logged for compliance audits.

SafeClaw enforces directional data flow: read-only access to approved source tables, write-only access to output tables, explicit denial of PII tables, and a tamper-proof audit trail for every operation.

Threat Model

A data pipeline agent without action-level gating can:

Read PII tables containing names, emails, Social Security numbers, or health records, creating regulatory exposure under GDPR, CCPA, or HIPAA.
Modify source data by running UPDATE or DELETE queries on production tables, corrupting the source of truth.
Write to arbitrary tables outside the designated output schema, polluting other teams' data or overwriting critical reference tables.
Exfiltrate data by sending query results to external endpoints, file shares, or cloud storage buckets.
Drop tables or schemas through destructive SQL commands that cascade across foreign key relationships.
Bypass row-level security if the database connection uses an overprivileged service account.

SafeClaw gates every action before it reaches the database driver or filesystem, ensuring the agent stays within its designated lane.

Recommended Policy

# Data Pipeline Agent Policy policy: name: "data-pipeline-read-source-write-output" default: DENY rules: # --- File Read --- - action: file_read path: "/app/pipeline/config/**" decision: ALLOW - action: file_read path: "/app/pipeline/sql/transforms/**" decision: ALLOW # --- File Write --- - action: file_write path: "/app/pipeline/output/**" decision: ALLOW - action: file_write path: "/app/pipeline/logs/**" decision: ALLOW - action: file_write path: "/app/pipeline/config/**" decision: DENY - action: file_write path: "/app/pipeline/sql/**" decision: DENY # --- Shell Exec --- - action: shell_exec command: "python3 /app/pipeline/run_transform.py*" decision: ALLOW - action: shell_exec command: "psqlSELECTFROM source.*" decision: ALLOW - action: shell_exec command: "psqlSELECTFROM pii.*" decision: DENY - action: shell_exec command: "psqlINSERT INTO output." decision: ALLOW - action: shell_exec command: "psqlUPDATE" decision: DENY - action: shell_exec command: "psqlDELETE" decision: DENY - action: shell_exec command: "psqlDROP" decision: DENY - action: shell_exec command: "psqlALTER" decision: REQUIRE_APPROVAL # --- Network --- - action: network domain: "source-db.internal:5432" decision: ALLOW - action: network domain: "output-db.internal:5432" decision: ALLOW - action: network domain: "api.openai.com" decision: ALLOW

- action: network domain: "*" decision: DENY

Example Action Requests

1. Agent reads from a source table (ALLOW)

{
  "action": "shell_exec",
  "command": "psql -h source-db.internal -c \"SELECT id, amount, category FROM source.transactions WHERE date = '2026-02-13'\"",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:00:01Z"
}
// Decision: ALLOW — matches psqlSELECTFROM source.*

2. Agent writes to the output table (ALLOW)

{
  "action": "shell_exec",
  "command": "psql -h output-db.internal -c \"INSERT INTO output.daily_summary (date, total, count) VALUES ('2026-02-13', 48230.50, 1523)\"",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:01:00Z"
}
// Decision: ALLOW — matches psqlINSERT INTO output.

3. Agent attempts to read PII table (DENY)

{
  "action": "shell_exec",
  "command": "psql -h source-db.internal -c \"SELECT name, email, ssn FROM pii.customers\"",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:02:00Z"
}
// Decision: DENY — matches psqlSELECTFROM pii.* which is explicitly denied

4. Agent attempts a DELETE on source data (DENY)

{
  "action": "shell_exec",
  "command": "psql -h source-db.internal -c \"DELETE FROM source.transactions WHERE date < '2025-01-01'\"",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:03:00Z"
}
// Decision: DENY — psqlDELETE is explicitly denied

5. Agent attempts an ALTER TABLE (REQUIRE_APPROVAL)

{
  "action": "shell_exec",
  "command": "psql -h output-db.internal -c \"ALTER TABLE output.daily_summary ADD COLUMN region TEXT\"",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:04:00Z"
}
// Decision: REQUIRE_APPROVAL — schema changes require human sign-off

6. Agent attempts to exfiltrate data (DENY)

{
  "action": "network",
  "domain": "external-storage.example.com",
  "method": "POST",
  "agent": "data-pipeline-agent",
  "timestamp": "2026-02-13T02:05:00Z"
}
// Decision: DENY — domain not in allowlist, wildcard catch-all denies

Setup Steps

Install SafeClaw on the machine or container running your pipeline:

   npx @authensor/safeclaw

Use the browser-based setup wizard. Free tier with 7-day renewable keys, no credit card required.

Select the "Data Pipeline" template from the wizard. This starts with directional flow rules: read from source, write to output.

Map your database schemas. Define which schemas are source (read-only), output (write-only), and PII (fully denied). Use the exact schema names from your database.

Deny destructive SQL operations. Explicitly deny UPDATE, DELETE, DROP, and TRUNCATE against all schemas. Use REQUIRE_APPROVAL for ALTER if your pipeline occasionally needs schema migrations.

Lock network access to only the database hosts and your LLM provider. Deny all other domains to prevent data exfiltration.

Integrate SafeClaw before every database call. In your pipeline code, evaluate each query before passing it to the database driver:

   from safeclaw import evaluate

def execute_query(query: str, host: str):
       decision = evaluate({
           "action": "shell_exec",
           "command": f"psql -h {host} -c \"{query}\"",
           "agent": "data-pipeline-agent"
       })
       if decision["result"] == "DENY":
           raise PermissionError(f"SafeClaw denied query: {query}")
       if decision["result"] == "REQUIRE_APPROVAL":
           wait_for_approval(decision["approval_id"])
       return db.execute(query)

Enable audit logging from day one. SafeClaw's tamper-proof SHA-256 hash chain creates a compliance-ready record. Export logs in JSON or CSV for regulatory review.

Run in simulation mode during development. Switch to enforcement mode before the pipeline processes production data.

Cross-References

SafeClaw Quickstart Guide — Full installation and first-run walkthrough
Audit Trail and Hash Chain — Technical details on tamper-proof logging and compliance exports
Deny-by-Default Architecture — Why SafeClaw blocks everything unless explicitly allowed
REQUIRE_APPROVAL Workflow — How human-in-the-loop approval works
Shell Exec Policy Rules — Command matching syntax and wildcard patterns

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw