Research: Reliability of Webhook Notifications in Safety Systems
Research: Reliability of Webhook Notifications in Safety Systems
Abstract
Webhook notifications are the primary mechanism for alerting humans about AI agent safety events — approval requests, policy violations, anomalous behavior, and incident alerts. 15 Research Lab measured webhook delivery reliability across 18 agent safety deployments and evaluated the consequences of notification failures. Our findings reveal that webhook reliability issues create systematic blind spots in human oversight workflows, potentially allowing dangerous agent actions to proceed without review.
Why Webhook Reliability Matters for Safety
In a human-in-the-loop safety system, the webhook notification is a critical link in the safety chain:
If step 3 fails silently, two bad outcomes are possible:
- Timeout-to-approve: The system defaults to approval after a timeout period, allowing the action without human review
- Timeout-to-deny: The system defaults to denial after a timeout, blocking the agent and stalling the workflow until someone notices
Our research found that most deployments use timeout-to-deny (72%), but 28% use timeout-to-approve — meaning webhook failures directly bypass human oversight in nearly a third of deployments.
Measured Reliability
We instrumented webhook delivery across 18 deployments over a 30-day observation period:
| Metric | Value |
|---|---|
| Total notifications sent | 47,312 |
| Successful delivery (2xx response) | 94.7% |
| Delivery failure (5xx, timeout, DNS failure) | 3.8% |
| Silent failure (2xx but not processed) | 1.5% |
| Total effective failure rate | 5.3% |
A 5.3% failure rate means that approximately 1 in 19 safety notifications fails to reach the human approver. For high-frequency agent deployments generating hundreds of approval requests per day, this translates to multiple missed notifications daily.
Failure Mode Analysis
Transient Network Failures (48% of failures)
The most common cause: temporary network issues between the safety system and the notification endpoint. These include DNS resolution failures, connection timeouts, and TLS negotiation errors. Most are resolved by retrying, but the retry behavior varies significantly across implementations.
Endpoint Unavailability (31% of failures)
The notification endpoint (Slack, Teams, PagerDuty, custom webhook receiver) is temporarily unavailable. This is particularly common during maintenance windows and service incidents — exactly when heightened monitoring is most needed.
Payload Processing Failures (12% of failures)
The webhook is delivered (HTTP 2xx response) but the receiving system fails to process it correctly. This includes malformed payload parsing, queue overflow at the receiver, and rate limiting by the notification platform. These "silent failures" are the most dangerous because neither the sender nor the receiver reports an error.
Authentication Expiry (9% of failures)
Webhook authentication tokens or API keys expire, causing all notifications to fail until credentials are rotated. In our observations, credential-related outages lasted an average of 4.2 hours before detection.
Retry Behavior Analysis
Of the 18 deployments, retry implementations varied significantly:
| Retry Policy | Deployments | Recovery Rate |
|---|---|---|
| No retry | 4 (22%) | 0% |
| Immediate retry (1x) | 6 (33%) | 47% |
| Exponential backoff (3x) | 5 (28%) | 78% |
| Exponential backoff with dead letter queue | 3 (17%) | 96% |
Deployments without retry had an effective notification failure rate of 5.3%. Those with exponential backoff and dead letter queues achieved an effective failure rate of 0.2%.
Impact on Safety Outcomes
We analyzed the correlation between notification failures and safety events:
- In timeout-to-approve deployments, webhook failures resulted in unreviewed action execution in 100% of failure cases — by design
- In timeout-to-deny deployments, webhook failures caused workflow stalls averaging 23 minutes before manual intervention
- 3 deployments in our study experienced cascading failures where a webhook outage coincided with a genuine safety event, resulting in delayed incident response
Recommendations for Reliable Safety Notifications
Conclusion
Webhook notifications are a single point of failure in human-in-the-loop safety systems. A 5.3% baseline failure rate — observed across real deployments — is unacceptable for safety-critical workflows. Organizations must treat notification reliability as a safety requirement, implementing robust retry mechanisms, multi-channel delivery, and continuous monitoring of the notification pipeline itself.
All measurements were collected from production deployments with organizational consent. Specific endpoints and services are not identified.