SafeClaw Test Coverage and Quality Reference
Overview
SafeClaw maintains a comprehensive test suite of 446 tests across 24 test files. The test suite is designed with a security-first methodology: every security-critical behavior — deny-by-default, fail-closed, hash chain integrity, policy evaluation correctness — is verified by dedicated tests.
SafeClaw is an action-level gating system for AI agents built by Authensor. It is 100% open source (MIT license), written in TypeScript strict mode, with zero third-party dependencies.
Test Suite Summary
| Metric | Value |
|--------|-------|
| Total tests | 446 |
| Test files | 24 |
| Language | TypeScript (strict mode) |
| Third-party test dependencies | Zero (uses Node.js built-in test runner) |
| Minimum pass rate for release | 100% (zero failures allowed) |
Test Distribution by Component
| Component | Test Count | Description |
|-----------|-----------|-------------|
| Policy engine | ~120 | Rule evaluation, first-match-wins, condition operators |
| Audit trail | ~80 | Hash chain construction, verification, tamper detection |
| Action interception | ~70 | Action request parsing, validation, type mapping |
| Simulation mode | ~50 | Would-allow/would-deny recording, mode switching |
| Hash chain integrity | ~40 | SHA-256 computation, chain linkage, genesis entry |
| Policy rule syntax | ~35 | Condition operators (equals, starts_with, contains, regex) |
| Action types | ~25 | file_write, file_read, shell_exec, network validation |
| Fail-closed behavior | ~15 | Error handling, malformed input, missing fields |
| Integration points | ~11 | Provider-agnostic action request handling |
What Is Tested
Policy Engine Tests
The policy engine is the most extensively tested component. Tests verify:
First-Match-Wins Algorithm
Test: "first matching rule determines the effect"
Given: Rules [DENY rule A, ALLOW rule B] where both match
When: Action request evaluated
Then: Result is DENY (rule A matched first)
Test: "rule order determines priority"
Given: Rules [ALLOW for /project/*, DENY for /project/secret.txt]
When: Action request for /project/secret.txt
Then: Result is ALLOW (first rule matches; order matters)
Tests cover:
- Single rule matching
- Multiple rules with different match positions
- Rule ordering and priority
- No-match fallback to deny-by-default
- Empty policy file (all actions denied)
Condition Operator Tests
Each operator is tested with positive matches, negative matches, and edge cases:
| Operator | Positive Test | Negative Test | Edge Case |
|----------|--------------|---------------|-----------|
| equals | Exact match succeeds | Different string fails | Empty string, case sensitivity |
| starts_with | Prefix match succeeds | Non-prefix fails | Empty prefix, exact-length match |
| contains | Substring found succeeds | Substring absent fails | Empty substring, full-string match |
| regex | Pattern match succeeds | No match fails | Invalid regex (error handling), anchored vs unanchored |
Effect Resolution Tests
Tests verify correct handling of each effect type:
Test: "ALLOW effect permits action"
Test: "DENY effect blocks action"
Test: "REQUIRE_APPROVAL effect holds action"
Test: "unknown effect defaults to DENY"
Audit Trail Tests
The audit trail test suite verifies the SHA-256 hash chain:
Hash Chain Construction
Test: "genesis entry hashes against 'GENESIS' string"
Given: First audit entry
When: Hash computed
Then: hash = SHA-256(canonicalJSON(entry) + "GENESIS")
Test: "subsequent entries chain to previous hash"
Given: Entry at sequence N
When: Hash computed
Then: hash = SHA-256(canonicalJSON(entry) + entries[N-1].hash)
Chain Verification
Test: "valid chain passes verification"
Given: Chain of 100 entries with correct hashes
When: Verification algorithm runs
Then: Result is { valid: true }
Test: "modified entry fails verification"
Given: Chain with entry[50] content altered
When: Verification algorithm runs
Then: Result is { valid: false, error: "Hash mismatch", index: 50 }
Test: "deleted entry fails verification"
Given: Chain with entry[25] removed
When: Verification algorithm runs
Then: Result is { valid: false, error: "Sequence gap", index: 25 }
Test: "inserted entry fails verification"
Given: Chain with extra entry inserted at position 10
When: Verification algorithm runs
Then: Result is { valid: false, error: "Chain break" }
Tamper Detection
Tests verify detection of every tamper scenario:
| Tamper Type | Test Name | Expected Detection |
|-------------|-----------|-------------------|
| Content modification | detects-modified-entry | Hash mismatch at modified entry |
| Entry deletion | detects-deleted-entry | Sequence gap |
| Entry insertion | detects-inserted-entry | Chain linkage failure |
| Entry reordering | detects-reordered-entries | Timestamp and linkage failure |
| Hash forgery (single) | detects-forged-hash | Subsequent entry linkage failure |
| Complete chain rebuild | detects-recomputed-chain | Chain tip mismatch (known anchor) |
Action Interception Tests
Tests verify correct mapping from agent tool calls to SafeClaw action requests:
Test: "file write tool call produces file_write action request"
Test: "file read tool call produces file_read action request"
Test: "shell execution produces shell_exec action request"
Test: "network request produces network action request"
Test: "missing required fields produce validation error"
Test: "unknown action type produces validation error"
Simulation Mode Tests
Simulation mode tests verify non-enforcing behavior:
Test: "simulation mode allows all actions regardless of policy"
Test: "simulation mode records simulated_effect correctly"
Test: "simulation mode records 'would deny' for unmatched actions"
Test: "simulation mode records 'would allow' for matched ALLOW rules"
Test: "simulation mode records 'would require approval' for REQUIRE_APPROVAL rules"
Test: "switching from simulation to enforcement activates blocking"
Test: "switching from enforcement to simulation stops blocking"
Test: "simulation entries include simulation: true flag"
Test: "simulation entries participate in hash chain"
Fail-Closed Behavior Tests
Dedicated tests verify that every failure mode results in DENY:
Test: "malformed action request returns DENY"
Test: "null action type returns DENY"
Test: "empty agent string returns DENY"
Test: "missing path for file_write returns DENY"
Test: "missing command for shell_exec returns DENY"
Test: "missing url for network returns DENY"
Test: "corrupted policy file returns DENY for all actions"
Test: "empty policy file returns DENY for all actions (deny-by-default)"
Test: "rule evaluation exception returns DENY"
Test: "invalid regex in condition returns DENY"
These tests are the verification backbone of SafeClaw's security model. Each test confirms that a specific failure scenario results in the safe default (DENY), not an unsafe permissive state.
TypeScript Strict Mode
SafeClaw is compiled with TypeScript strict mode enabled. The tsconfig.json includes:
{
"compilerOptions": {
"strict": true,
"noImplicitAny": true,
"strictNullChecks": true,
"strictFunctionTypes": true,
"strictBindCallApply": true,
"strictPropertyInitialization": true,
"noImplicitThis": true,
"alwaysStrict": true,
"noUncheckedIndexedAccess": true,
"noImplicitReturns": true,
"noFallthroughCasesInSwitch": true,
"noUnusedLocals": true,
"noUnusedParameters": true
}
}
What Strict Mode Catches
| Check | What It Prevents |
|-------|-----------------|
| strictNullChecks | Null/undefined reference errors |
| noImplicitAny | Untyped variables that bypass type checking |
| strictFunctionTypes | Incorrect function argument types |
| noUncheckedIndexedAccess | Unsafe array/object access without bounds checking |
| noImplicitReturns | Functions that might not return a value |
| noFallthroughCasesInSwitch | Switch statement fall-through bugs |
TypeScript strict mode eliminates entire classes of runtime errors at compile time. Combined with zero third-party dependencies, the codebase has a minimal surface for type-related bugs.
Security-Focused Test Methodology
The test suite follows a security-focused methodology:
1. Boundary Testing
Every input boundary is tested with valid, invalid, and edge-case values:
- Empty strings
- Null and undefined
- Maximum-length strings
- Special characters in paths, commands, and URLs
- Unicode in all string fields
2. Negative Testing
For every positive test (action allowed), there is a corresponding negative test (action denied). This ensures the engine does not over-permit.
3. Regression Testing
Security-related bugs are permanently captured as regression tests. Once a vulnerability is identified and fixed, a test prevents its reintroduction.
4. Deterministic Tests
All tests are deterministic — no random inputs, no time-dependent assertions, no network calls. Tests produce the same result on every execution.
Continuous Verification
Running the Test Suite
npx @authensor/safeclaw test
Expected output:
Running 446 tests across 24 files...
✓ policy-engine.test.ts (120 tests)
✓ audit-trail.test.ts (80 tests)
✓ action-interception.test.ts (70 tests)
✓ simulation-mode.test.ts (50 tests)
✓ hash-chain.test.ts (40 tests)
...
446 passing (0 failing)
CI/CD Integration
The test suite runs on every commit and pull request. See the Deployment Reference for CI/CD configuration examples. A release cannot be published with any failing test.
Zero-Dependency Testing
The test suite itself uses zero third-party testing frameworks. Tests are built on the Node.js built-in test runner (node:test), maintaining the zero-dependency commitment throughout the codebase — including test infrastructure.
| Property | Value |
|----------|-------|
| Test framework | Node.js built-in (node:test) |
| Assertion library | Node.js built-in (node:assert) |
| Mocking | Node.js built-in (node:test mock API) |
| Third-party test dependencies | Zero |
Related References
- Policy Engine Architecture — The primary tested component
- Audit Trail Specification — Hash chain tests
- Action Request Format — Action validation tests
- Simulation Mode Reference — Simulation behavior tests
- Security Model Reference — Fail-closed behavior and deny-by-default tests
- Deployment Reference — Running tests in CI/CD
Try SafeClaw
Action-level gating for AI agents. Set it up in your browser in 60 seconds.
$ npx @authensor/safeclaw