Research: Comparing Safety Features Across LLM Providers
Research: Comparing Safety Features Across LLM Providers
Abstract
The LLM powering an AI agent directly influences the agent's safety profile. 15 Research Lab conducted a comparative analysis of built-in safety features across five major LLM providers — OpenAI, Anthropic, Google DeepMind, Meta, and Mistral — with specific focus on features relevant to agent deployments. Our research evaluates how provider-level controls affect agent safety and where external safety layers remain necessary.
Methodology
We evaluated each provider's API and model offerings against 32 safety-relevant criteria organized into four domains: tool-call controls, output filtering, rate and cost limits, and abuse prevention. Testing was conducted between December 2025 and January 2026 using each provider's most capable model available for agent use cases.
Evaluation Domains
Domain 1: Tool-Call Controls
This domain measures how much control providers give developers over the tool-calling behavior of their models.
Structured tool-call schemas are now universally supported — all five providers allow developers to define tool interfaces with JSON Schema validation. However, the depth of control varies significantly:- Providers A and B allow forcing specific tool selection and constraining parallel tool calls
- Providers C and D offer basic tool definitions without execution ordering controls
- Provider E provides tool definitions with experimental tool-use policies
Domain 2: Output Filtering
Content filtering for agent outputs showed meaningful variation:
- Three providers offered configurable content filters with adjustable thresholds
- Two providers applied fixed filters with no developer control
- No provider offered agent-specific output filtering (e.g., detecting credentials in tool-call parameters)
Domain 3: Rate and Cost Controls
All providers offered API-level rate limits, but the granularity varied:
| Feature | Providers Supporting |
|---|---|
| Requests per minute limits | 5/5 |
| Token-based rate limiting | 4/5 |
| Spend caps / budget alerts | 3/5 |
| Per-session cost tracking | 1/5 |
| Tool-call-specific rate limits | 0/5 |
The absence of tool-call-specific rate limits across all providers is notable. An agent might make 1,000 file write operations within a single API call session, and no provider currently limits this behavior.
Domain 4: Abuse Prevention
Abuse detection for agent use cases remains nascent:
- Two providers offer automated abuse detection that considers tool-call patterns
- One provider flags unusual tool-call sequences for manual review
- Two providers apply only standard API abuse detection (credential sharing, volume anomalies) without agent-specific logic
Key Finding: The Provider Gap
Our central finding is that no LLM provider currently offers sufficient built-in safety features for production agent deployments. The providers focus — reasonably — on model-level safety (refusing harmful content generation) and API-level safety (rate limits, abuse detection). But the agent-specific safety layer — controlling what the agent does with its tools — falls outside every provider's current scope.
This gap is structural, not incidental. LLM providers do not have visibility into how their models' tool calls are executed in the customer's environment. They cannot enforce file system boundaries, network restrictions, or credential handling policies because these are deployment-specific concerns.
Where External Safety Layers Fit
Given this provider gap, external safety tools that operate at the agent layer are essential for production deployments. SafeClaw addresses precisely this gap — it operates between the LLM provider and the tool execution layer, enforcing policies on tool calls regardless of which LLM provider is used. This provider-agnostic approach means organizations can switch or combine LLM providers without rebuilding their safety infrastructure. The SafeClaw knowledge base documents integration patterns for multiple provider configurations.
Recommendations
Conclusion
LLM providers are making meaningful investments in safety, but their focus remains on model behavior rather than agent behavior. Until providers offer tool-call-level safety controls, external safety layers are not optional — they are the only mechanism available for governing what agents do in production.
15RL maintains no commercial relationships with any LLM provider. All evaluations were conducted using standard API access.