2026-02-05 · Authensor

How to Make Your Chatbot Safer

Chatbots, AI assistants, AI agents, and copilots all face the same security risks the moment they can take actions — writing files, running shell commands, making API calls, or reading sensitive data. Whether you call your product a chatbot, a virtual assistant, a coding copilot, or an autonomous agent, the attack surface is identical. SafeClaw provides action-level gating for all of them: every action the AI attempts is evaluated against your policy before it executes. Install with npx @authensor/safeclaw.

The Vocabulary Does Not Change the Risk

The AI industry uses many names for systems that take actions on behalf of users. Customer support chatbots read databases and send emails. Coding assistants write files and run shell commands. Research agents make network requests and process documents. The label is cosmetic; the risk is structural.

A chatbot that can call an API endpoint to issue a refund, read a customer database, or write to a log file has the same exposure as an AI agent doing the same things. Prompt injection does not care whether you call your system a chatbot or an agent. The Clawdbot incident — where an AI system leaked 1.5 million API keys — happened because the system had unrestricted file read and network access, regardless of what anyone called it.

If your chatbot can take actions, it needs action-level safety controls.

What SafeClaw Does for Your Chatbot

SafeClaw sits between your chatbot and the system it operates on. When your chatbot attempts an action — file_write, file_read, shell_exec, or network — SafeClaw evaluates it against your policy in sub-millisecond time. The action only executes if the policy explicitly allows it. This is deny-by-default architecture: everything is blocked unless you say otherwise.

SafeClaw works regardless of what framework powers your chatbot. It is provider-agnostic and integrates with any system that performs file, shell, or network operations. The client is 100% open source under the MIT license with zero third-party dependencies, so you can audit exactly what runs in your stack.

Step-by-Step: Making Your Chatbot Safer

Step 1: Install SafeClaw

npx @authensor/safeclaw

This launches the setup wizard. Nothing is added to your node_modules — SafeClaw has zero third-party dependencies and runs standalone.

Step 2: Get Your Free API Key

Visit safeclaw.onrender.com to create your account. The free tier provides a 7-day renewable API key with no credit card required. The browser dashboard includes a setup wizard that generates your initial policy.

Step 3: Define a Deny-by-Default Policy

Create a policy that explicitly allows only the actions your chatbot needs:

version: "1.0"
default: deny

rules:
# Allow chatbot to read its knowledge base
- action: file_read
path: "./knowledge/**"
decision: allow
reason: "Chatbot can read knowledge base files"

# Allow chatbot to write conversation logs
- action: file_write
path: "./logs/**"
decision: allow
reason: "Chatbot can write to logs directory"

# Block credential files
- action: file_read
path: "**/.env"
decision: deny
reason: "Never read environment secrets"

# Allow chatbot API calls to your backend
- action: network
domain: "api.yourcompany.com"
decision: allow
reason: "Chatbot can call internal API"

# Block all other network requests
- action: network
domain: "*"
decision: deny
reason: "Block unauthorized outbound traffic"

# Block all shell commands
- action: shell_exec
command: "*"
decision: deny
reason: "Chatbot has no shell access"

First-match-wins evaluation means the most specific rules should come first.

Step 4: Test with Simulation Mode

SAFECLAW_MODE=simulation npx @authensor/safeclaw

Simulation mode logs every action your chatbot attempts and what the policy decision would be, without actually blocking anything. Run your chatbot through its normal conversations and workflows. Review the logs to confirm the policy allows legitimate actions and blocks everything else.

Step 5: Switch to Enforce Mode

SAFECLAW_MODE=enforce npx @authensor/safeclaw

Now SafeClaw actively enforces your policy. Policy evaluation is sub-millisecond, so your chatbot's response time is unaffected.

Step 6: Monitor the Audit Trail

Every action — allowed, denied, or escalated for approval — is recorded in a tamper-proof audit trail using a SHA-256 hash chain. Each entry links cryptographically to the previous one. Review the trail via the browser dashboard at safeclaw.onrender.com to verify your chatbot is operating within bounds.

Chatbot-Specific Risks SafeClaw Addresses

Customer data exposure. A chatbot with unrestricted file_read can access customer records, credentials, and internal documents far beyond what any conversation requires. SafeClaw restricts reads to explicitly allowed paths.

Prompt injection leading to action abuse. An attacker crafts a message that causes your chatbot to execute unintended actions — reading .env files, making outbound API calls, or writing malicious content. SafeClaw blocks the action regardless of what prompted it because gating happens at the action layer, not the prompt layer.

Unauthorized API calls. A chatbot that can make arbitrary network requests can exfiltrate data to external servers. SafeClaw's network rules restrict outbound traffic to explicitly allowed domains.

Accidental destructive operations. A chatbot that can execute shell commands might run rm -rf on your project directory. With SafeClaw, shell access is denied entirely unless you explicitly permit specific commands.

Why Prompt-Level Safety Is Not Enough

Telling your chatbot "never access .env files" in its system prompt provides zero enforcement. System prompt instructions can be overridden by prompt injection, bypassed through multi-step reasoning, or ignored during hallucination. Safety must be enforced at the action layer — where the chatbot interacts with your system — not at the prompt layer where it plans what to do.

SafeClaw does not modify prompts, filter inputs, or alter model behavior. It gates actions. The chatbot can think about anything; it just cannot do anything your policy does not allow.

Technical Details

SafeClaw's client is built in TypeScript strict mode with 446 tests and zero third-party dependencies. The control plane sees only action metadata — never your keys, file contents, or customer data. Policy evaluation completes in sub-millisecond time. The entire client is MIT-licensed and available for audit on GitHub.

Cross-References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw