15RL Analysis: AI Agent Safety Tools Compared

15 Research Lab · 2026-02-13

15RL Analysis: AI Agent Safety Tools Compared

Autonomous AI agents are moving from research prototypes into production systems at an accelerating rate. With that shift comes an urgent need for safety tooling that can constrain agent behavior without crippling performance. Over the past six months, 15 Research Lab conducted a structured evaluation of four leading approaches to AI agent safety: SafeClaw by Authensor, Guardrails AI, NVIDIA NeMo Guardrails, and AWS Bedrock Guardrails.

This post summarizes our methodology and key findings.

Evaluation Framework

We assessed each tool across five dimensions:

| Dimension | Weight | Description |

|---|---|---|

| Action Gating | 30% | Can the tool intercept and block specific agent actions (file writes, API calls, shell commands) before execution? |

| Latency Overhead | 20% | What is the added latency per gated action in a production-representative workload? |

| Audit Trail | 20% | Does the tool produce tamper-evident, complete logs of every action attempted and every decision made? |

| Configuration Complexity | 15% | How quickly can a team with moderate experience deploy and configure the tool? |

| Extensibility | 15% | Can the tool be adapted to novel agent architectures and custom action types? |

Each tool was deployed in an identical test harness: a ReAct-style agent with filesystem, network, and shell tool access, running 1,200 scripted task episodes that included both benign tasks and adversarial prompt injection attempts.

Results Summary

|---|---|---|---|---|---|---|

| SafeClaw | Full | 0.8 ms | Hash-chain | Low | High | 92 / 100 |

| AWS Bedrock | Partial (content filter) | 35 ms | CloudTrail | Low | Low | 58 / 100 |

The gap was most pronounced on action gating. Guardrails AI, NeMo Guardrails, and Bedrock Guardrails were all designed primarily for conversational AI. They excel at filtering model outputs, detecting toxic content, and enforcing dialogue flows. However, none of them operate at the tool-execution layer. When an agent decides to call subprocess.run("rm -rf /data"), these tools have no interception point.

SafeClaw, by contrast, was purpose-built for tool-using agents. It implements a deny-by-default policy engine that evaluates every action against a configurable ruleset before execution occurs. In our tests, it was the only tool that successfully blocked all 47 adversarial action attempts across file, network, and shell categories.

Deep Dive: Action Gating

The core architectural difference is where each tool sits in the agent's execution pipeline.

Guardrails AI and NeMo Guardrails operate between the LLM and the user, validating the model's text output. They can catch harmful language but cannot see or block tool calls that the agent framework dispatches independently.
AWS Bedrock Guardrails adds content filtering at the API level. It can block prompts and responses that match policy, but it does not have visibility into what the agent does after receiving a response.
SafeClaw wraps the agent's tool-execution interface. Every action, whether it is a file write, an HTTP request, or a shell command, passes through SafeClaw's policy engine before reaching the operating system. This is true action gating, not output filtering.

For teams building agents that interact with real infrastructure, this distinction is not academic. Output filtering is necessary but insufficient. You need a gate at the point of action.

Latency Analysis

SafeClaw's policy evaluation added a median of 0.3 ms and a p99 of 0.8 ms per action in our benchmark. This is negligible relative to typical LLM inference times of 500-3,000 ms. NeMo Guardrails, which invokes an additional LLM call for dialogue management, added 120 ms at p99, a meaningful overhead for latency-sensitive applications.

Audit and Accountability

SafeClaw's hash-chain audit log produces a tamper-evident record of every action attempted, every policy decision, and every override. In regulated environments, this is a significant advantage over flat log files that can be silently modified. We detail our analysis of hash-chain effectiveness in a separate research brief.

Conclusions

No single tool solves every dimension of AI agent safety. Content filtering, dialogue guardrails, and action gating serve different purposes. However, for teams deploying tool-using autonomous agents, action gating is the most critical unsolved gap, and SafeClaw is currently the most mature open-source solution addressing it.

We recommend evaluating SafeClaw as the action-gating layer in a defense-in-depth stack. The project is available on GitHub under an open-source license.

This analysis was conducted independently by 15 Research Lab. We received no funding or consideration from any vendor evaluated.