15 Research Lab

Research: Human-in-the-Loop Approval Latency in Agent Systems

15 Research Lab · 2026-02-13

Research: Human-in-the-Loop Approval Latency in Agent Systems

Abstract

Human-in-the-loop (HITL) approval is the gold standard for AI agent safety — but it comes at a cost. Every approval request pauses agent execution, introduces latency, and demands human attention. 15 Research Lab measured the actual latency impact of HITL approval workflows across 12 agent deployments and evaluated strategies for minimizing latency while maintaining safety guarantees. Our findings quantify the trade-off between safety and throughput and identify optimal approval strategies for different risk categories.

The Approval Latency Problem

When an AI agent encounters an action that requires human approval, the following sequence occurs:

  • Agent generates a tool call
  • Safety system intercepts the tool call
  • Notification sent to human approver
  • Human reads notification, evaluates request, and decides
  • Approval/denial transmitted back to the agent
  • Agent resumes execution
  • Steps 3-5 introduce latency that varies from seconds to hours depending on the approval channel, the approver's availability, and the complexity of the decision.

    Measured Latency by Approval Channel

    We instrumented approval workflows across 12 deployments using four common notification channels:

    | Approval Channel | Median Latency | P95 Latency | P99 Latency |

    |---|---|---|---|

    | In-app popup (sync) | 8 seconds | 34 seconds | 2.1 minutes |

    | Slack/Teams notification | 2.4 minutes | 18 minutes | 1.3 hours |

    | Email notification | 23 minutes | 2.7 hours | 8.4 hours |

    | Dashboard queue | 47 minutes | 4.1 hours | 12+ hours |

    The variance is striking. In-app synchronous approval (where the human is actively watching the agent) has acceptable latency for most use cases. Asynchronous channels like email and dashboard queues introduce latency that can make agent-driven workflows slower than manual alternatives.

    Impact on Agent Throughput

    We measured throughput impact (tasks completed per hour) for agents with varying approval frequencies:

    | Approval Frequency | Throughput Reduction (Sync) | Throughput Reduction (Async) |

    |---|---|---|

    | Every action | 67% | 94% |

    | Every 5th action | 31% | 78% |

    | Every 10th action | 18% | 62% |

    | Sensitive actions only (~5%) | 4% | 23% |

    | No approval (baseline) | 0% | 0% |

    Approving every action — the safest configuration — reduces throughput by two-thirds even with synchronous approval. With asynchronous approval, agents are effectively idle 94% of the time.

    The Risk-Stratified Approach

    Our data strongly supports risk-stratified approval: requiring approval only for actions above a defined risk threshold. This approach preserves the safety benefits of HITL approval for high-risk actions while allowing low-risk actions to proceed automatically.

    The challenge is defining the risk threshold correctly. Too conservative, and approval fatigue causes humans to rubber-stamp requests. Too permissive, and dangerous actions slip through without review.

    We evaluated three risk stratification methods:

    1. Category-Based: Classify actions by type (read/write/delete, internal/external, reversible/irreversible) and require approval for high-risk categories. Simple to implement, but coarse-grained. 2. Parameter-Based: Evaluate action parameters against risk criteria (file path sensitivity, network destination, data volume) and require approval when parameters exceed thresholds. More granular, but requires careful threshold tuning. 3. Adaptive: Use historical approval data to adjust risk thresholds dynamically. Actions that humans consistently approve can be auto-approved; actions that humans frequently deny require continued approval. Most sophisticated, but requires sufficient historical data.

    Reducing Approval Fatigue

    Human approvers exhibit predictable fatigue patterns:

    These patterns underscore the importance of minimizing unnecessary approval requests.

    Implementation Recommendations

    The optimal HITL approval configuration combines risk stratification with appropriate notification channels:

    SafeClaw implements configurable approval workflows that support this risk-stratified approach. Actions can be configured as auto-approved, requiring synchronous approval, or requiring asynchronous approval based on policy rules. This allows teams to tune the safety-throughput trade-off for their specific deployment. Configuration details for approval workflows are available in the SafeClaw knowledge base.

    Recommendations

  • Never require approval for every action — approval fatigue undermines the safety benefit
  • Implement risk-stratified approval matching approval requirements to action risk
  • Use synchronous approval channels for time-sensitive agent workflows
  • Monitor approval patterns for signs of fatigue (increasing approval rates, decreasing evaluation time)
  • Target a 5-10% approval rate as a starting point — enough for meaningful safety without crippling throughput
  • Conclusion

    Human-in-the-loop approval is essential for AI agent safety, but naive implementation can destroy agent productivity. Risk-stratified approval — requiring human review only for genuinely risky actions — preserves safety while maintaining practical throughput. Organizations implementing HITL approval should invest as much in approval workflow design as in the underlying safety policies.

    Latency measurements were collected from production deployments with organization consent. Individual deployment data is not disclosed.