15RL Methodology: Testing AI Agent Safety Policies
15RL Methodology: Testing AI Agent Safety Policies
Introduction
Safety policies are only as good as their testing. A policy that appears comprehensive on paper may fail to catch real threats due to implementation gaps, configuration errors, or unanticipated edge cases. 15 Research Lab developed a systematic methodology for testing AI agent safety policies, drawing on practices from software testing, penetration testing, and safety engineering. This methodology is designed to be repeatable, measurable, and applicable to any policy framework.
Testing Philosophy
Our methodology is built on three principles:
The Four Testing Phases
Phase 1: Policy Verification
Objective: Verify that the policy configuration is syntactically correct, internally consistent, and complete. Tests:- Syntax validation: Does the policy parse without errors?
- Consistency checks: Are there conflicting rules (one rule allows what another denies)?
- Completeness checks: Does every tool the agent can access have a corresponding policy? Are there tools without any policy coverage?
- Default behavior verification: When no rule matches, does the system deny (as it should) or allow?
Phase 2: Functional Testing
Objective: Verify that individual policy rules produce the expected outcome for known inputs. Tests:For each policy rule, create test cases covering:
- Positive cases: Actions that should be allowed by this rule
- Negative cases: Actions that should be denied by this rule
- Boundary cases: Actions at the exact boundary of the rule (e.g., file size limits, path boundaries, time windows)
- Parameter variation: Same action with different parameters to verify parameter-level enforcement
``
Test ID: [unique identifier]
Policy Rule: [rule being tested]
Tool Call: [exact tool call specification]
Expected Outcome: [allow/deny/escalate]
Actual Outcome: [recorded during test]
Pass/Fail: [determined by comparison]
`
Coverage target: Minimum 3 test cases per policy rule (one positive, one negative, one boundary).
Phase 3: Adversarial Testing (Red Team)
Objective: Attempt to bypass safety policies using techniques an adversary would employ.
Test scenarios:
Path traversal: Attempt to access files outside allowed directories using ../`, symlinks, and encoded paths
Phase 4: Regression Testing
Objective: Ensure that policy changes do not inadvertently weaken existing protections. Process:Coverage Analysis
After completing all four phases, assess policy coverage:
| Coverage Dimension | Measurement | Target |
|---|---|---|
| Tool coverage | % of tools with at least one policy rule | 100% |
| Rule test coverage | % of rules with functional test cases | 100% |
| Adversarial scenario coverage | % of red team scenarios tested | > 80% |
| Parameter space coverage | % of parameter variations tested per rule | > 60% |
| Regression suite size | Test cases per policy rule | > 5 |
Implementing This Methodology
Organizations can implement this methodology incrementally:
Week 1: Phase 1 (Policy Verification) — audit existing policies for syntax, consistency, and completeness. Week 2-3: Phase 2 (Functional Testing) — create and execute test cases for each policy rule. Week 4: Phase 3 (Adversarial Testing) — conduct red team exercises against the policy configuration. Ongoing: Phase 4 (Regression Testing) — build automation and integrate into the deployment pipeline. SafeClaw provides a testable policy framework — its declarative policy configuration is amenable to automated verification (Phase 1), functional testing (Phase 2), and regression testing (Phase 4). Organizations using SafeClaw can apply this methodology directly to their policy configurations. The SafeClaw knowledge base includes examples of policy configurations that serve as a starting point for test case development.Recommendations
Conclusion
Untested safety policies are unreliable safety policies. This methodology provides a systematic approach to building confidence in your agent safety configuration. The investment in testing is small compared to the cost of discovering policy gaps through production incidents.
15RL's testing methodology is available for adaptation under an open license. We welcome community contributions and refinements.