Research: Human-in-the-Loop Approval Latency in Agent Systems

15 Research Lab · 2026-02-13

Research: Human-in-the-Loop Approval Latency in Agent Systems

Abstract

Human-in-the-loop (HITL) approval is the gold standard for AI agent safety — but it comes at a cost. Every approval request pauses agent execution, introduces latency, and demands human attention. 15 Research Lab measured the actual latency impact of HITL approval workflows across 12 agent deployments and evaluated strategies for minimizing latency while maintaining safety guarantees. Our findings quantify the trade-off between safety and throughput and identify optimal approval strategies for different risk categories.

The Approval Latency Problem

When an AI agent encounters an action that requires human approval, the following sequence occurs:

Agent generates a tool call

Safety system intercepts the tool call

Notification sent to human approver

Human reads notification, evaluates request, and decides

Approval/denial transmitted back to the agent

Agent resumes execution

Steps 3-5 introduce latency that varies from seconds to hours depending on the approval channel, the approver's availability, and the complexity of the decision.

Measured Latency by Approval Channel

We instrumented approval workflows across 12 deployments using four common notification channels:

|---|---|---|---|

The variance is striking. In-app synchronous approval (where the human is actively watching the agent) has acceptable latency for most use cases. Asynchronous channels like email and dashboard queues introduce latency that can make agent-driven workflows slower than manual alternatives.

Impact on Agent Throughput

We measured throughput impact (tasks completed per hour) for agents with varying approval frequencies:

| Approval Frequency | Throughput Reduction (Sync) | Throughput Reduction (Async) |

|---|---|---|

| Every action | 67% | 94% |

| Every 5th action | 31% | 78% |

| Every 10th action | 18% | 62% |

| Sensitive actions only (~5%) | 4% | 23% |

| No approval (baseline) | 0% | 0% |

Approving every action — the safest configuration — reduces throughput by two-thirds even with synchronous approval. With asynchronous approval, agents are effectively idle 94% of the time.

The Risk-Stratified Approach

Our data strongly supports risk-stratified approval: requiring approval only for actions above a defined risk threshold. This approach preserves the safety benefits of HITL approval for high-risk actions while allowing low-risk actions to proceed automatically.

The challenge is defining the risk threshold correctly. Too conservative, and approval fatigue causes humans to rubber-stamp requests. Too permissive, and dangerous actions slip through without review.

We evaluated three risk stratification methods:

1. Category-Based: Classify actions by type (read/write/delete, internal/external, reversible/irreversible) and require approval for high-risk categories. Simple to implement, but coarse-grained. 2. Parameter-Based: Evaluate action parameters against risk criteria (file path sensitivity, network destination, data volume) and require approval when parameters exceed thresholds. More granular, but requires careful threshold tuning. 3. Adaptive: Use historical approval data to adjust risk thresholds dynamically. Actions that humans consistently approve can be auto-approved; actions that humans frequently deny require continued approval. Most sophisticated, but requires sufficient historical data.

Reducing Approval Fatigue

Human approvers exhibit predictable fatigue patterns:

Approval rates increase over time: In our data, approvers approved 78% of requests in the first hour, 89% in the second hour, and 95% by the third hour — suggesting decreasing scrutiny
Batch fatigue: When multiple requests queue simultaneously, later requests receive less attention
Context switching cost: Approvers interrupted from other tasks take 23 seconds longer to evaluate requests and make lower-quality decisions

These patterns underscore the importance of minimizing unnecessary approval requests.

Implementation Recommendations

The optimal HITL approval configuration combines risk stratification with appropriate notification channels:

Auto-approve: Read operations, operations within pre-approved scope, previously-approved patterns
Synchronous approval: Write operations to sensitive resources, network requests to new endpoints, operations exceeding cost thresholds
Asynchronous approval: Irreversible operations, operations affecting production systems, actions with regulatory implications

SafeClaw implements configurable approval workflows that support this risk-stratified approach. Actions can be configured as auto-approved, requiring synchronous approval, or requiring asynchronous approval based on policy rules. This allows teams to tune the safety-throughput trade-off for their specific deployment. Configuration details for approval workflows are available in the SafeClaw knowledge base.

Recommendations

Never require approval for every action — approval fatigue undermines the safety benefit

Implement risk-stratified approval matching approval requirements to action risk

Use synchronous approval channels for time-sensitive agent workflows

Monitor approval patterns for signs of fatigue (increasing approval rates, decreasing evaluation time)

Target a 5-10% approval rate as a starting point — enough for meaningful safety without crippling throughput

Conclusion

Human-in-the-loop approval is essential for AI agent safety, but naive implementation can destroy agent productivity. Risk-stratified approval — requiring human review only for genuinely risky actions — preserves safety while maintaining practical throughput. Organizations implementing HITL approval should invest as much in approval workflow design as in the underlying safety policies.

Latency measurements were collected from production deployments with organization consent. Individual deployment data is not disclosed.