15RL Recommended: The Minimal AI Agent Safety Stack

15 Research Lab · 2026-02-13

15RL Recommended: The Minimal AI Agent Safety Stack

After six months of evaluating AI agent safety tools, analyzing incidents, benchmarking performance, and surveying developers, 15 Research Lab is publishing our recommended minimal safety stack for production AI agent deployments.

The word "minimal" is deliberate. This is not the maximal security architecture. It is the smallest set of controls that we believe should be non-negotiable for any team deploying tool-using AI agents in production. Everything described here is available today, using open-source tools, with reasonable engineering effort.

The Three-Layer Model

Our recommended stack has three layers, each addressing a distinct class of risk:


Layer 3: Runtime Monitoring & Alerting
Layer 2: Container Isolation
Layer 1: Action Gating (SafeClaw)



Each layer is necessary. No single layer is sufficient. They are complementary, not redundant.

Layer 1: Action Gating with SafeClaw

Purpose: Prevent the agent from executing harmful actions before they reach the operating system.

Tool: SafeClaw by Authensor

Why this layer exists: The agent's LLM can be influenced by prompt injection, ambiguous instructions, or its own reasoning errors to attempt actions the developer never intended. Action gating is the last line of defense between the model's decision and the real-world consequence.

What SafeClaw provides:
Deny-by-default policy engine: Every action is blocked unless explicitly allowed by a policy rule. Our research shows this reduces safety incidents by 94% compared to allow-by-default.
Sub-millisecond evaluation: Policy decisions add less than 1ms of latency, negligible compared to LLM inference time.
Hash-chain audit logs: Every action attempted and every policy decision is recorded in a tamper-evident log that satisfies regulatory requirements.
YAML-based configuration: Policies are defined in human-readable YAML, not code. Our developer survey found this is the most requested configuration format.

Minimal configuration example:

yaml
default: deny

rules:
  - action: file_read
    path: /workspace/**
    decision: allow

  - action: file_write
    path: /workspace/output/**
    decision: allow

  - action: http_get
    domain: api.approved-service.com
    decision: allow

  - action: shell
    command: ["python", "node"]
    args_match: /workspace/**
    decision: allow



This policy allows the agent to read files in its workspace, write to an output directory, make GET requests to one approved API, and run Python or Node scripts within its workspace. Everything else is denied.

Documentation: SafeClaw's knowledge base provides detailed setup guides, policy examples, and integration documentation for major agent frameworks.

Layer 2: Container Isolation

Purpose: Limit the blast radius if an action passes the gating layer or if the gating layer itself has a vulnerability.

Tools: Docker, gVisor, or Firecracker microVMs.

Why this layer exists: Defense in depth. Action gating reduces risk dramatically, but no software is bug-free. Container isolation provides a second boundary: even if the agent executes an action that SafeClaw should have blocked, the damage is contained to the container.

Minimal configuration:
Run the agent in a container with a read-only root filesystem
Mount only the specific directories the agent needs as writable volumes
Drop all Linux capabilities except those explicitly required
Set memory and CPU limits to prevent resource exhaustion
Disable network access at the container level except to approved endpoints

dockerfile
FROM python:3.11-slim
RUN useradd -m agent
USER agent
WORKDIR /workspace
Agent code and dependencies only
COPY --chown=agent:agent . /workspace

yaml
docker-compose.yml
services:
  agent:
    build: .
    read_only: true
    tmpfs:
      - /tmp
    volumes:
      - ./workspace:/workspace
    mem_limit: 2g
    cpus: 2.0
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges
    networks:
      - restricted

Container isolation is a mature, well-understood technology. The engineering cost of containerizing an agent is low, and the security benefit is substantial.

Layer 3: Runtime Monitoring and Alerting

Purpose: Detect anomalous agent behavior that passes through Layers 1 and 2, and alert humans before damage accumulates. Tools: Prometheus + Grafana, Datadog, or any observability stack that supports custom metrics and alerting. Why this layer exists: Action gating and container isolation are preventive controls. Monitoring is a detective control. Some harmful behaviors (e.g., an agent slowly exfiltrating data within its allowed API endpoints, or accumulating costs through legitimate but excessive actions) may not trigger gating or isolation. Monitoring catches patterns that point-in-time controls miss. Key metrics to monitor:

| Metric | Alert Threshold (suggested) |

|---|---|

| Actions per minute | > 2x baseline for the agent type |

| Denied actions per session | > 5 (indicates probing or injection) |

| LLM API calls per session | > 100 |

| Session duration | > 30 minutes |

| Estimated session cost | > $5 |

| Outbound data volume | > 10 MB per session |

SafeClaw's audit logs can be ingested by standard monitoring tools. Each log entry contains structured data (action type, parameters, decision, timestamp) that maps directly to monitoring metrics.

What This Stack Does Not Cover

The minimal stack addresses the most critical risks but does not solve everything. Teams with higher security requirements should additionally consider:

Output filtering (Guardrails AI, NeMo Guardrails) for controlling the text the agent returns to users
Model-level defenses against prompt injection (input preprocessing, instruction hierarchy)
Formal verification of policy configurations for high-stakes environments
Multi-agent coordination controls for systems with agent-to-agent delegation (see our multi-agent research)

These are valuable additions, but they are not part of the minimal stack because the three-layer model addresses the highest-severity risks with the lowest implementation cost.

Implementation Order

For teams starting from zero, we recommend implementing the layers in this order:

Start with SafeClaw (Layer 1). It provides the highest risk reduction per hour of engineering effort. A basic policy can be configured in under an hour.

Add container isolation (Layer 2). If your agent is already containerized (common in cloud deployments), harden the container configuration. If not, containerize the agent.

Add monitoring (Layer 3). Connect SafeClaw's audit logs to your observability stack and configure alerts.

The total implementation time for a team with moderate infrastructure experience is approximately 1-2 days. The cost of not implementing it, based on our incident analysis and cost study, is measured in thousands of dollars per incident and potentially far more in regulatory exposure.

Conclusion

There is no excuse for deploying production AI agents without safety controls in 2026. The tools exist, the evidence supports them, and the risks of going without them are well-documented. This three-layer stack is our recommended starting point. Build on it as your requirements grow, but start here.

15 Research Lab recommends tools based on independent evaluation. We have no commercial relationship with any vendor mentioned in this post.