15RL Guide: Minimum Viable Safety for AI Agents

15 Research Lab · 2026-02-13

15RL Guide: Minimum Viable Safety for AI Agents

Introduction

Not every organization is ready for a comprehensive AI agent safety program. But every organization deploying AI agents needs some safety controls — and knowing the minimum effective set is essential for teams with limited resources. 15 Research Lab distilled our research across hundreds of agent deployments into a minimum viable safety (MVS) specification: the smallest set of controls that prevents the most common and most damaging agent failures.

What Minimum Viable Safety Is Not

MVS is not a recommendation for a permanent safety posture. It is a starting point — the floor, not the ceiling. Organizations should advance beyond MVS as their agent deployments mature and their safety capabilities develop. But MVS is dramatically better than no safety controls, which is the current state of the majority of agent deployments we observe.

The Five Essential Controls

Control 1: Deny-by-Default Action Policy

What it means: The agent cannot perform any action unless that action is explicitly permitted by a policy. If no policy matches an action, the default is deny. Why it is essential: Deny-by-default is the single most impactful safety control. In our research, it prevents 78% of agent safety incidents when properly configured. The alternative — allow-by-default with blocklists — fails to anticipate novel dangerous actions and provides systematically weaker protection. How to implement: Define a list of approved tools and approved parameter ranges. Everything else is blocked. Start restrictive and expand based on observed needs.

Control 2: Session-Level Cost Limits

What it means: Each agent session has a maximum cost limit. When the limit is reached, the agent stops. Why it is essential: Runaway cost events are the most common agent incident (affecting 41% of deployments in our survey data). A $50 session limit prevents the $23,000 data warehouse incident. The specific limit should be calibrated to expected task cost — we recommend starting at 10x the median expected cost per task. How to implement: Track cumulative cost (LLM tokens + external API calls) per session. Hard-stop when the limit is reached.

Control 3: Structured Audit Logging

What it means: Every agent action is logged in a structured format that includes the tool name, parameters, timestamp, policy evaluation result, and execution outcome. Why it is essential: Without audit logs, you cannot investigate incidents, verify compliance, detect anomalies, or improve your safety policies. Audit logging is the foundation for every advanced safety capability. How to implement: Log every tool call in JSON format. Include at minimum: timestamp, session ID, tool name, parameters (with sensitive values redacted), policy decision (allow/deny), and outcome (success/failure/error).

Control 4: Sensitive Resource Protection

What it means: Specific resources — credential files, production databases, system configurations, administrative interfaces — are explicitly protected from agent access. Why it is essential: Agents that access sensitive resources inadvertently are responsible for the highest-severity incidents in our dataset. Explicit protection ensures that even if the deny-by-default policy is too permissive, the most damaging actions are still blocked. How to implement: Maintain a list of protected resources (file paths, database names, API endpoints, system commands). Block any agent action that references a protected resource, regardless of other policy evaluations.

Control 5: Human Approval for Destructive Operations

What it means: Any agent action that modifies or deletes data, changes system configuration, or affects production systems requires human approval before execution. Why it is essential: Destructive operations are irreversible — once files are deleted, databases are modified, or deployments are pushed, the damage is done. Human approval is the last line of defense against agent errors with permanent consequences. How to implement: Classify agent actions as read-only or mutating. Route all mutating actions through a human approval workflow (even a simple CLI prompt is better than no approval).

Implementation Priority

If you can only implement one control today, implement them in this order:

Deny-by-default — highest impact, prevents the widest range of incidents

Cost limits — prevents the most common incident type

Audit logging — enables everything else

Sensitive resource protection — prevents the highest-severity incidents

Human approval for destructive operations — last line of defense

Getting Started Quickly

SafeClaw implements all five MVS controls in a single tool: deny-by-default action gating, configurable cost and session limits, structured audit logging, resource protection policies, and human approval workflows. For teams seeking the fastest path to minimum viable safety, it provides these controls without requiring multiple tools or complex integration. The SafeClaw knowledge base includes quick-start guides that can be followed in under an hour.

Measuring MVS Effectiveness

Once implemented, track these metrics to verify your MVS is working:

| Metric | Target | What It Tells You |

|---|---|---|

| Actions denied by policy (daily) | > 0 | Your policies are catching something |

| Cost limit events (weekly) | Decreasing | Agents are staying within bounds |

| Audit log completeness | 100% | Every action is captured |

| Sensitive resource access attempts | Decreasing | Protection is working and agents are adapting |

| Human approval response time | < 5 minutes | Approval workflow is functional |

Advancing Beyond MVS

Once MVS is operational, the next investments should be:

Anomaly detection on audit logs

Automated policy testing in CI/CD

Risk-stratified approval workflows

Regular policy review and refinement

Formal incident response procedures

These build on the MVS foundation without requiring a complete overhaul.

Conclusion

Minimum viable safety is achievable for any team deploying AI agents. Five controls — deny-by-default, cost limits, audit logging, sensitive resource protection, and human approval — prevent the vast majority of agent safety incidents. The investment required is small relative to the risk reduction achieved. Every organization deploying AI agents should implement at least these five controls before going to production.

15RL provides this guide as open guidance for the AI agent community. It is based on our research data and represents our best current recommendations.