Using SafeClaw for Data Pipeline Agents: Read-Source Write-Output Policy
Scenario
You run an AI agent that processes data from a source database, transforms it, and writes results to output tables. The source database contains sensitive data including PII tables that the agent should never access. The agent should read from source tables, write only to designated output tables, and never modify the source data. Every action must be logged for compliance audits.
SafeClaw enforces directional data flow: read-only access to approved source tables, write-only access to output tables, explicit denial of PII tables, and a tamper-proof audit trail for every operation.
Threat Model
A data pipeline agent without action-level gating can:
- Read PII tables containing names, emails, Social Security numbers, or health records, creating regulatory exposure under GDPR, CCPA, or HIPAA.
- Modify source data by running UPDATE or DELETE queries on production tables, corrupting the source of truth.
- Write to arbitrary tables outside the designated output schema, polluting other teams' data or overwriting critical reference tables.
- Exfiltrate data by sending query results to external endpoints, file shares, or cloud storage buckets.
- Drop tables or schemas through destructive SQL commands that cascade across foreign key relationships.
- Bypass row-level security if the database connection uses an overprivileged service account.
Recommended Policy
# Data Pipeline Agent Policy
policy:
name: "data-pipeline-read-source-write-output"
default: DENY
rules:
# --- File Read ---
- action: file_read
path: "/app/pipeline/config/**"
decision: ALLOW
- action: file_read
path: "/app/pipeline/sql/transforms/**"
decision: ALLOW
# --- File Write ---
- action: file_write
path: "/app/pipeline/output/**"
decision: ALLOW
- action: file_write
path: "/app/pipeline/logs/**"
decision: ALLOW
- action: file_write
path: "/app/pipeline/config/**"
decision: DENY
- action: file_write
path: "/app/pipeline/sql/**"
decision: DENY
# --- Shell Exec ---
- action: shell_exec
command: "python3 /app/pipeline/run_transform.py*"
decision: ALLOW
- action: shell_exec
command: "psqlSELECTFROM source.*"
decision: ALLOW
- action: shell_exec
command: "psqlSELECTFROM pii.*"
decision: DENY
- action: shell_exec
command: "psqlINSERT INTO output."
decision: ALLOW
- action: shell_exec
command: "psqlUPDATE"
decision: DENY
- action: shell_exec
command: "psqlDELETE"
decision: DENY
- action: shell_exec
command: "psqlDROP"
decision: DENY
- action: shell_exec
command: "psqlALTER"
decision: REQUIRE_APPROVAL
# --- Network ---
- action: network
domain: "source-db.internal:5432"
decision: ALLOW
- action: network
domain: "output-db.internal:5432"
decision: ALLOW
- action: network
domain: "api.openai.com"
decision: ALLOW
- action: network
domain: "*"
decision: DENY
Example Action Requests
1. Agent reads from a source table (ALLOW)
{
"action": "shell_exec",
"command": "psql -h source-db.internal -c \"SELECT id, amount, category FROM source.transactions WHERE date = '2026-02-13'\"",
"agent": "data-pipeline-agent",
"timestamp": "2026-02-13T02:00:01Z"
}
// Decision: ALLOW — matches psqlSELECTFROM source.*
2. Agent writes to the output table (ALLOW)
{
"action": "shell_exec",
"command": "psql -h output-db.internal -c \"INSERT INTO output.daily_summary (date, total, count) VALUES ('2026-02-13', 48230.50, 1523)\"",
"agent": "data-pipeline-agent",
"timestamp": "2026-02-13T02:01:00Z"
}
// Decision: ALLOW — matches psqlINSERT INTO output.
3. Agent attempts to read PII table (DENY)
{
"action": "shell_exec",
"command": "psql -h source-db.internal -c \"SELECT name, email, ssn FROM pii.customers\"",
"agent": "data-pipeline-agent",
"timestamp": "2026-02-13T02:02:00Z"
}
// Decision: DENY — matches psqlSELECTFROM pii.* which is explicitly denied
4. Agent attempts a DELETE on source data (DENY)
{
"action": "shell_exec",
"command": "psql -h source-db.internal -c \"DELETE FROM source.transactions WHERE date < '2025-01-01'\"",
"agent": "data-pipeline-agent",
"timestamp": "2026-02-13T02:03:00Z"
}
// Decision: DENY — psqlDELETE is explicitly denied
5. Agent attempts an ALTER TABLE (REQUIRE_APPROVAL)
{
"action": "shell_exec",
"command": "psql -h output-db.internal -c \"ALTER TABLE output.daily_summary ADD COLUMN region TEXT\"",
"agent": "data-pipeline-agent",
"timestamp": "2026-02-13T02:04:00Z"
}
// Decision: REQUIRE_APPROVAL — schema changes require human sign-off
6. Agent attempts to exfiltrate data (DENY)
{
"action": "network",
"domain": "external-storage.example.com",
"method": "POST",
"agent": "data-pipeline-agent",
"timestamp": "2026-02-13T02:05:00Z"
}
// Decision: DENY — domain not in allowlist, wildcard catch-all denies
Setup Steps
- Install SafeClaw on the machine or container running your pipeline:
npx @authensor/safeclaw
Use the browser-based setup wizard. Free tier with 7-day renewable keys, no credit card required.
- Select the "Data Pipeline" template from the wizard. This starts with directional flow rules: read from source, write to output.
- Map your database schemas. Define which schemas are source (read-only), output (write-only), and PII (fully denied). Use the exact schema names from your database.
- Deny destructive SQL operations. Explicitly deny UPDATE, DELETE, DROP, and TRUNCATE against all schemas. Use REQUIRE_APPROVAL for ALTER if your pipeline occasionally needs schema migrations.
- Lock network access to only the database hosts and your LLM provider. Deny all other domains to prevent data exfiltration.
- Integrate SafeClaw before every database call. In your pipeline code, evaluate each query before passing it to the database driver:
from safeclaw import evaluate
def execute_query(query: str, host: str):
decision = evaluate({
"action": "shell_exec",
"command": f"psql -h {host} -c \"{query}\"",
"agent": "data-pipeline-agent"
})
if decision["result"] == "DENY":
raise PermissionError(f"SafeClaw denied query: {query}")
if decision["result"] == "REQUIRE_APPROVAL":
wait_for_approval(decision["approval_id"])
return db.execute(query)
- Enable audit logging from day one. SafeClaw's tamper-proof SHA-256 hash chain creates a compliance-ready record. Export logs in JSON or CSV for regulatory review.
- Run in simulation mode during development. Switch to enforcement mode before the pipeline processes production data.
Cross-References
- SafeClaw Quickstart Guide — Full installation and first-run walkthrough
- Audit Trail and Hash Chain — Technical details on tamper-proof logging and compliance exports
- Deny-by-Default Architecture — Why SafeClaw blocks everything unless explicitly allowed
- REQUIRE_APPROVAL Workflow — How human-in-the-loop approval works
- Shell Exec Policy Rules — Command matching syntax and wildcard patterns
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw