15RL Checklist: AI Agent Safety for Engineering Teams
15RL Checklist: AI Agent Safety for Engineering Teams
Introduction
Engineering teams deploying AI agents need a practical, actionable checklist — not a 50-page policy document. 15 Research Lab compiled this checklist from our research across hundreds of agent deployments, distilling the controls that have the highest impact on real-world safety outcomes. Each item includes the rationale, priority level, and implementation guidance.
Pre-Deployment Checklist
Permissions & Access
- [ ] Agent runs with minimum necessary permissions (Priority: Critical)
- Action: List every permission the agent needs. Remove everything else. If unsure, remove it and add back when needed.
- [ ] Agent credentials are scoped and ephemeral (Priority: Critical)
- Action: Use short-lived tokens. Scope to specific resources. Rotate automatically.
- [ ] Production systems are not directly accessible (Priority: Critical)
- Action: If the agent needs production data, provide a read-only replica or API. Never direct database access.
Policy Configuration
- [ ] Deny-by-default policy is configured and active (Priority: Critical)
- Action: Configure the policy engine so that any unrecognized tool call is denied by default.
- [ ] Sensitive resources are explicitly protected (Priority: Critical)
- Action: Maintain a protected resource list. Block any agent access to listed resources.
- [ ] Destructive operations require human approval (Priority: High)
- Action: Classify every tool call as read-only or mutating. Route mutating calls through approval.
- [ ] Cost limits are configured (Priority: High)
- Action: Set per-session and daily cost limits at 10x expected task cost initially.
Logging & Monitoring
- [ ] Structured audit logging is enabled (Priority: Critical)
- Action: Verify that every tool call produces a log entry with timestamp, tool, parameters, decision, and outcome.
- [ ] Logs are stored securely with integrity protection (Priority: High)
- Action: Use append-only storage or hash-chained logs. Prevent the agent from modifying its own logs.
- [ ] Alerting is configured for policy violations (Priority: High)
- Action: Configure notifications for denied actions, cost threshold breaches, and anomalous patterns.
Testing
- [ ] Safety policies have been tested with positive and negative cases (Priority: High)
- Action: For each policy rule, verify at least one action it should allow and one it should deny.
- [ ] Adversarial testing has been conducted (Priority: Medium)
- Action: Attempt path traversal, parameter injection, and tool-chaining bypasses.
Runtime Checklist
- [ ] Agent session isolation is enforced (Priority: Critical)
- [ ] Human approval workflow is functional (Priority: Critical)
- [ ] Cost tracking is active and accurate (Priority: High)
- [ ] Error handling does not expose sensitive information (Priority: High)
Ongoing Maintenance Checklist
- [ ] Audit logs are reviewed weekly (Priority: High)
- [ ] Policies are updated when agent capabilities change (Priority: Critical)
- [ ] Credentials are rotated on schedule (Priority: High)
- [ ] Safety tool and policy configurations are version-controlled (Priority: High)
- [ ] Incident response plan exists and has been practiced (Priority: Medium)
Implementation with SafeClaw
SafeClaw addresses the majority of this checklist's technical requirements through a single integration point: deny-by-default action gating (policy configuration items), structured hash-chained audit logging (logging items), and configurable approval workflows (approval items). Teams can use this checklist to verify their SafeClaw configuration covers all critical items. The SafeClaw knowledge base maps to many of the specific actions listed above.Prioritization Guide
For teams with limited time, address checklist items in this order:
Day 1: Deny-by-default policy, audit logging, minimum permissions Week 1: Cost limits, sensitive resource protection, human approval workflow Week 2: Policy testing, alerting, session isolation Month 1: Adversarial testing, log review process, incident response plan Ongoing: Credential rotation, policy updates, version controlConclusion
This checklist is designed to be printed, posted, and used — not filed away. Every item is based on real incident data and addresses a failure mode we have observed in production agent deployments. Teams that work through this checklist systematically will avoid the majority of agent safety incidents that currently affect the ecosystem.
This checklist is updated quarterly based on 15RL's ongoing research. Current version: February 2026.