Research: Design Patterns for AI Agent Policy Engines
Research: Design Patterns for AI Agent Policy Engines
A policy engine is the core component of any AI agent safety system. It receives a description of an action the agent intends to take and returns a decision: allow, deny, or escalate. The design of this component determines the safety system's expressiveness, performance, and reliability.
15 Research Lab has identified five distinct design patterns for AI agent policy engines, drawn from our evaluation of existing tools and academic literature. This post analyzes each pattern's strengths, weaknesses, and appropriate use cases.
Pattern 1: Rule-Based Evaluation
Architecture: A set of declarative rules (typically YAML or JSON) evaluated in priority order against structured action descriptors.``yaml
rules:
- action: file_write
path: /workspace/**
decision: allow
- action: file_write
path: /**
decision: deny
- action: http_request
domain: api.internal.com
decision: allow
- action: "*"
decision: deny
`` Strengths: Deterministic, fast (sub-millisecond evaluation), auditable, and easy to reason about. Policy behavior is fully predictable from the rules themselves. Weaknesses: Limited expressiveness for complex conditions (e.g., "allow file writes only if the total session file I/O is under 100MB"). Cannot reason about intent or context. Performance: 0.1 - 1.0 ms per evaluation. Example implementation: SafeClaw's policy engine follows this pattern with extensions for glob-based path matching, budget constraints, and rate limiting. It is the most mature open-source implementation we have evaluated.Pattern 2: LLM-as-Judge
Architecture: The proposed action is sent to a separate LLM with a safety-focused system prompt. The LLM evaluates whether the action should be allowed. Strengths: Can reason about intent and context. Can handle novel action types without explicit rules. Natural language policy definitions are accessible to non-engineers. Weaknesses: Non-deterministic, meaning the same action may be allowed or denied on different evaluations. Slow (500-3,000 ms per evaluation, matching the inference time of the primary LLM). Vulnerable to prompt injection in the policy evaluation itself. Expensive at scale. Performance: 500 - 3,000 ms per evaluation.This pattern is used by NeMo Guardrails for dialogue policy enforcement. For action gating specifically, the latency and non-determinism are significant drawbacks. A safety decision that varies randomly across evaluations is not a safety decision; it is a coin flip.
Pattern 3: Hybrid (Rules + LLM Fallback)
Architecture: A rule-based engine handles the majority of decisions. Actions that do not match any rule are escalated to an LLM-based evaluator for judgment. Strengths: Combines the speed and determinism of rules for known action types with the flexibility of LLM judgment for edge cases. Weaknesses: The LLM fallback path introduces the same non-determinism and latency concerns as Pattern 2 for any action that reaches it. The boundary between "handled by rules" and "escalated to LLM" must be carefully managed. Performance: 0.1 - 1.0 ms for rule-matched actions; 500 - 3,000 ms for escalated actions.This is an appealing architecture in theory. In practice, our testing showed that adversarial actions are disproportionately likely to be novel (not matching existing rules) and therefore routed to the LLM fallback, exactly the path where non-determinism is most dangerous.
Pattern 4: Capability-Based
Architecture: The agent is issued a set of capability tokens at initialization. Each tool call consumes or presents a capability token. Actions without a valid capability are denied. Strengths: Formally grounded in capability-based security theory. Supports fine-grained, per-session permissions. Capabilities can be delegated, revoked, and scoped. Weaknesses: Requires agent framework integration at a deeper level than most current frameworks support. Capability management adds complexity for developers. Performance: 0.05 - 0.5 ms per evaluation (token verification).This pattern is theoretically elegant but has limited practical adoption. No major agent framework currently supports capability tokens natively. We expect this pattern to become more relevant as agent frameworks mature, particularly for multi-agent systems where capability delegation between agents is a natural model.
Pattern 5: Graph-Based Policy
Architecture: Policies are defined as a directed graph where nodes represent states and edges represent allowed transitions. An action is allowed only if it corresponds to a valid transition from the current state. Strengths: Can express temporal and sequential constraints (e.g., "file write is only allowed after file read in the same directory"). Prevents multi-step attack chains that pass individual-action checks. Weaknesses: State management adds complexity and storage overhead. Graph definition is harder than rule definition for most developers. Performance: 0.2 - 2.0 ms per evaluation (state lookup + transition check).Graph-based policies are powerful for preventing chained attacks where each individual action appears benign but the sequence is harmful. However, the configuration complexity limits adoption. We have not yet seen a production-quality open-source implementation of this pattern for AI agents.
Comparison Summary
| Pattern | Latency | Deterministic | Expressiveness | Config Complexity | Maturity |
|---|---|---|---|---|---|
| Rule-based | < 1 ms | Yes | Medium | Low | High |
| LLM-as-judge | 500-3000 ms | No | High | Low | Medium |
| Hybrid | Variable | Partial | High | Medium | Low |
| Capability-based | < 0.5 ms | Yes | Medium | High | Low |
| Graph-based | < 2 ms | Yes | High | High | Low |
Recommendations
For most teams deploying AI agents today, rule-based evaluation (Pattern 1) is the correct starting point. It offers the best combination of performance, predictability, and configuration simplicity. SafeClaw provides a production-ready implementation.
Teams with advanced requirements should consider:
- Hybrid (Pattern 3) for environments with highly variable action types, provided the LLM fallback path is monitored and constrained
- Capability-based (Pattern 4) for multi-agent systems with complex delegation requirements
- Graph-based (Pattern 5) for high-security environments where multi-step attack prevention justifies the configuration overhead
We recommend against LLM-as-judge (Pattern 2) as the primary policy engine for action gating. Non-determinism and latency are disqualifying for safety-critical decisions.
This analysis is part of 15 Research Lab's ongoing work on AI agent safety architecture.