2025-11-03 · Authensor

SafeClaw Test Coverage and Quality Reference

Overview

SafeClaw maintains a comprehensive test suite of 446 tests across 24 test files. The test suite is designed with a security-first methodology: every security-critical behavior — deny-by-default, fail-closed, hash chain integrity, policy evaluation correctness — is verified by dedicated tests.

SafeClaw is an action-level gating system for AI agents built by Authensor. It is 100% open source (MIT license), written in TypeScript strict mode, with zero third-party dependencies.

Test Suite Summary

| Metric | Value |
|--------|-------|
| Total tests | 446 |
| Test files | 24 |
| Language | TypeScript (strict mode) |
| Third-party test dependencies | Zero (uses Node.js built-in test runner) |
| Minimum pass rate for release | 100% (zero failures allowed) |

Test Distribution by Component

| Component | Test Count | Description |
|-----------|-----------|-------------|
| Policy engine | ~120 | Rule evaluation, first-match-wins, condition operators |
| Audit trail | ~80 | Hash chain construction, verification, tamper detection |
| Action interception | ~70 | Action request parsing, validation, type mapping |
| Simulation mode | ~50 | Would-allow/would-deny recording, mode switching |
| Hash chain integrity | ~40 | SHA-256 computation, chain linkage, genesis entry |
| Policy rule syntax | ~35 | Condition operators (equals, starts_with, contains, regex) |
| Action types | ~25 | file_write, file_read, shell_exec, network validation |
| Fail-closed behavior | ~15 | Error handling, malformed input, missing fields |
| Integration points | ~11 | Provider-agnostic action request handling |

What Is Tested

Policy Engine Tests

The policy engine is the most extensively tested component. Tests verify:

First-Match-Wins Algorithm

Test: "first matching rule determines the effect"
  Given: Rules [DENY rule A, ALLOW rule B] where both match
  When:  Action request evaluated
  Then:  Result is DENY (rule A matched first)

Test: "rule order determines priority"
Given: Rules [ALLOW for /project/*, DENY for /project/secret.txt]
When: Action request for /project/secret.txt
Then: Result is ALLOW (first rule matches; order matters)

Tests cover:


Condition Operator Tests

Each operator is tested with positive matches, negative matches, and edge cases:

| Operator | Positive Test | Negative Test | Edge Case |
|----------|--------------|---------------|-----------|
| equals | Exact match succeeds | Different string fails | Empty string, case sensitivity |
| starts_with | Prefix match succeeds | Non-prefix fails | Empty prefix, exact-length match |
| contains | Substring found succeeds | Substring absent fails | Empty substring, full-string match |
| regex | Pattern match succeeds | No match fails | Invalid regex (error handling), anchored vs unanchored |

Effect Resolution Tests

Tests verify correct handling of each effect type:

Test: "ALLOW effect permits action"
Test: "DENY effect blocks action"
Test: "REQUIRE_APPROVAL effect holds action"
Test: "unknown effect defaults to DENY"

Audit Trail Tests

The audit trail test suite verifies the SHA-256 hash chain:

Hash Chain Construction

Test: "genesis entry hashes against 'GENESIS' string"
  Given: First audit entry
  When:  Hash computed
  Then:  hash = SHA-256(canonicalJSON(entry) + "GENESIS")

Test: "subsequent entries chain to previous hash"
Given: Entry at sequence N
When: Hash computed
Then: hash = SHA-256(canonicalJSON(entry) + entries[N-1].hash)

Chain Verification

Test: "valid chain passes verification"
  Given: Chain of 100 entries with correct hashes
  When:  Verification algorithm runs
  Then:  Result is { valid: true }

Test: "modified entry fails verification"
Given: Chain with entry[50] content altered
When: Verification algorithm runs
Then: Result is { valid: false, error: "Hash mismatch", index: 50 }

Test: "deleted entry fails verification"
Given: Chain with entry[25] removed
When: Verification algorithm runs
Then: Result is { valid: false, error: "Sequence gap", index: 25 }

Test: "inserted entry fails verification"
Given: Chain with extra entry inserted at position 10
When: Verification algorithm runs
Then: Result is { valid: false, error: "Chain break" }

Tamper Detection

Tests verify detection of every tamper scenario:

| Tamper Type | Test Name | Expected Detection |
|-------------|-----------|-------------------|
| Content modification | detects-modified-entry | Hash mismatch at modified entry |
| Entry deletion | detects-deleted-entry | Sequence gap |
| Entry insertion | detects-inserted-entry | Chain linkage failure |
| Entry reordering | detects-reordered-entries | Timestamp and linkage failure |
| Hash forgery (single) | detects-forged-hash | Subsequent entry linkage failure |
| Complete chain rebuild | detects-recomputed-chain | Chain tip mismatch (known anchor) |

Action Interception Tests

Tests verify correct mapping from agent tool calls to SafeClaw action requests:

Test: "file write tool call produces file_write action request"
Test: "file read tool call produces file_read action request"
Test: "shell execution produces shell_exec action request"
Test: "network request produces network action request"
Test: "missing required fields produce validation error"
Test: "unknown action type produces validation error"

Simulation Mode Tests

Simulation mode tests verify non-enforcing behavior:

Test: "simulation mode allows all actions regardless of policy"
Test: "simulation mode records simulated_effect correctly"
Test: "simulation mode records 'would deny' for unmatched actions"
Test: "simulation mode records 'would allow' for matched ALLOW rules"
Test: "simulation mode records 'would require approval' for REQUIRE_APPROVAL rules"
Test: "switching from simulation to enforcement activates blocking"
Test: "switching from enforcement to simulation stops blocking"
Test: "simulation entries include simulation: true flag"
Test: "simulation entries participate in hash chain"

Fail-Closed Behavior Tests

Dedicated tests verify that every failure mode results in DENY:

Test: "malformed action request returns DENY"
Test: "null action type returns DENY"
Test: "empty agent string returns DENY"
Test: "missing path for file_write returns DENY"
Test: "missing command for shell_exec returns DENY"
Test: "missing url for network returns DENY"
Test: "corrupted policy file returns DENY for all actions"
Test: "empty policy file returns DENY for all actions (deny-by-default)"
Test: "rule evaluation exception returns DENY"
Test: "invalid regex in condition returns DENY"

These tests are the verification backbone of SafeClaw's security model. Each test confirms that a specific failure scenario results in the safe default (DENY), not an unsafe permissive state.

TypeScript Strict Mode

SafeClaw is compiled with TypeScript strict mode enabled. The tsconfig.json includes:

{
  "compilerOptions": {
    "strict": true,
    "noImplicitAny": true,
    "strictNullChecks": true,
    "strictFunctionTypes": true,
    "strictBindCallApply": true,
    "strictPropertyInitialization": true,
    "noImplicitThis": true,
    "alwaysStrict": true,
    "noUncheckedIndexedAccess": true,
    "noImplicitReturns": true,
    "noFallthroughCasesInSwitch": true,
    "noUnusedLocals": true,
    "noUnusedParameters": true
  }
}

What Strict Mode Catches

| Check | What It Prevents |
|-------|-----------------|
| strictNullChecks | Null/undefined reference errors |
| noImplicitAny | Untyped variables that bypass type checking |
| strictFunctionTypes | Incorrect function argument types |
| noUncheckedIndexedAccess | Unsafe array/object access without bounds checking |
| noImplicitReturns | Functions that might not return a value |
| noFallthroughCasesInSwitch | Switch statement fall-through bugs |

TypeScript strict mode eliminates entire classes of runtime errors at compile time. Combined with zero third-party dependencies, the codebase has a minimal surface for type-related bugs.

Security-Focused Test Methodology

The test suite follows a security-focused methodology:

1. Boundary Testing

Every input boundary is tested with valid, invalid, and edge-case values:

2. Negative Testing

For every positive test (action allowed), there is a corresponding negative test (action denied). This ensures the engine does not over-permit.

3. Regression Testing

Security-related bugs are permanently captured as regression tests. Once a vulnerability is identified and fixed, a test prevents its reintroduction.

4. Deterministic Tests

All tests are deterministic — no random inputs, no time-dependent assertions, no network calls. Tests produce the same result on every execution.

Continuous Verification

Running the Test Suite

npx @authensor/safeclaw test

Expected output:

Running 446 tests across 24 files...

✓ policy-engine.test.ts (120 tests)
✓ audit-trail.test.ts (80 tests)
✓ action-interception.test.ts (70 tests)
✓ simulation-mode.test.ts (50 tests)
✓ hash-chain.test.ts (40 tests)
...

446 passing (0 failing)

CI/CD Integration

The test suite runs on every commit and pull request. See the Deployment Reference for CI/CD configuration examples. A release cannot be published with any failing test.

Zero-Dependency Testing

The test suite itself uses zero third-party testing frameworks. Tests are built on the Node.js built-in test runner (node:test), maintaining the zero-dependency commitment throughout the codebase — including test infrastructure.

| Property | Value |
|----------|-------|
| Test framework | Node.js built-in (node:test) |
| Assertion library | Node.js built-in (node:assert) |
| Mocking | Node.js built-in (node:test mock API) |
| Third-party test dependencies | Zero |

Related References

Try SafeClaw

Action-level gating for AI agents. Set it up in your browser in 60 seconds.

$ npx @authensor/safeclaw