Research: Comparing Safety Features Across LLM Providers

15 Research Lab · 2026-02-13

Research: Comparing Safety Features Across LLM Providers

Abstract

The LLM powering an AI agent directly influences the agent's safety profile. 15 Research Lab conducted a comparative analysis of built-in safety features across five major LLM providers — OpenAI, Anthropic, Google DeepMind, Meta, and Mistral — with specific focus on features relevant to agent deployments. Our research evaluates how provider-level controls affect agent safety and where external safety layers remain necessary.

Methodology

We evaluated each provider's API and model offerings against 32 safety-relevant criteria organized into four domains: tool-call controls, output filtering, rate and cost limits, and abuse prevention. Testing was conducted between December 2025 and January 2026 using each provider's most capable model available for agent use cases.

Evaluation Domains

Domain 1: Tool-Call Controls

This domain measures how much control providers give developers over the tool-calling behavior of their models.

Structured tool-call schemas are now universally supported — all five providers allow developers to define tool interfaces with JSON Schema validation. However, the depth of control varies significantly:

Providers A and B allow forcing specific tool selection and constraining parallel tool calls
Providers C and D offer basic tool definitions without execution ordering controls
Provider E provides tool definitions with experimental tool-use policies

Tool-call refusal — the model's ability to decline a tool call it deems unsafe — varied dramatically. In our adversarial testing, refusal rates for clearly dangerous tool calls ranged from 12% to 67% across providers. No provider achieved refusal rates above 70%, confirming that model-level safety alone is insufficient.

Domain 2: Output Filtering

Content filtering for agent outputs showed meaningful variation:

Three providers offered configurable content filters with adjustable thresholds
Two providers applied fixed filters with no developer control
No provider offered agent-specific output filtering (e.g., detecting credentials in tool-call parameters)

Domain 3: Rate and Cost Controls

All providers offered API-level rate limits, but the granularity varied:

| Feature | Providers Supporting |

|---|---|

| Requests per minute limits | 5/5 |

| Token-based rate limiting | 4/5 |

| Spend caps / budget alerts | 3/5 |

| Per-session cost tracking | 1/5 |

| Tool-call-specific rate limits | 0/5 |

The absence of tool-call-specific rate limits across all providers is notable. An agent might make 1,000 file write operations within a single API call session, and no provider currently limits this behavior.

Domain 4: Abuse Prevention

Abuse detection for agent use cases remains nascent:

Two providers offer automated abuse detection that considers tool-call patterns
One provider flags unusual tool-call sequences for manual review
Two providers apply only standard API abuse detection (credential sharing, volume anomalies) without agent-specific logic

Key Finding: The Provider Gap

Our central finding is that no LLM provider currently offers sufficient built-in safety features for production agent deployments. The providers focus — reasonably — on model-level safety (refusing harmful content generation) and API-level safety (rate limits, abuse detection). But the agent-specific safety layer — controlling what the agent does with its tools — falls outside every provider's current scope.

This gap is structural, not incidental. LLM providers do not have visibility into how their models' tool calls are executed in the customer's environment. They cannot enforce file system boundaries, network restrictions, or credential handling policies because these are deployment-specific concerns.

Where External Safety Layers Fit

Given this provider gap, external safety tools that operate at the agent layer are essential for production deployments. SafeClaw addresses precisely this gap — it operates between the LLM provider and the tool execution layer, enforcing policies on tool calls regardless of which LLM provider is used. This provider-agnostic approach means organizations can switch or combine LLM providers without rebuilding their safety infrastructure. The SafeClaw knowledge base documents integration patterns for multiple provider configurations.

Recommendations

Do not rely solely on LLM provider safety features for agent deployments

Implement an external action-gating layer that operates independently of the provider

Leverage provider-level rate limits as a cost control measure, not a safety measure

Monitor provider safety feature updates — this space is evolving rapidly

Test agent safety with each provider — model switching can change the safety profile significantly

Conclusion

LLM providers are making meaningful investments in safety, but their focus remains on model behavior rather than agent behavior. Until providers offer tool-call-level safety controls, external safety layers are not optional — they are the only mechanism available for governing what agents do in production.

15RL maintains no commercial relationships with any LLM provider. All evaluations were conducted using standard API access.