15 Research Lab

Research: Runtime vs Static Analysis for AI Agent Safety

15 Research Lab · 2026-02-13

Research: Runtime vs Static Analysis for AI Agent Safety

Abstract

AI agent safety can be enforced at two fundamental points: before deployment (static analysis of configurations, policies, and code) or during execution (runtime interception and evaluation of agent actions). 15 Research Lab conducted a comparative study of these two approaches across seven threat categories to determine when each approach is most effective and how they complement each other.

Defining the Approaches

Static Analysis examines agent configurations, policy definitions, tool schemas, and supporting code before the agent executes. It identifies potential issues through pattern matching, formal verification, dependency analysis, and configuration validation. Runtime Analysis intercepts agent actions during execution and evaluates them against policies, anomaly models, or safety rules. It operates on actual agent behavior rather than theoretical possibilities.

Experimental Design

We deployed 30 agent configurations across seven threat categories, evaluating both static and runtime analysis tools for each. Metrics included detection rate (percentage of threats caught), false positive rate, and latency impact (for runtime tools).

Results by Threat Category

1. Policy Misconfiguration

Static Analysis: Detected 89% of policy misconfigurations (overly permissive rules, conflicting policies, missing required policies). This is the strongest domain for static analysis — configuration errors are visible without execution. Runtime Analysis: Detected 34% of policy misconfigurations, typically only when a misconfigured policy allowed an action that triggered a secondary anomaly detector. Winner: Static analysis, decisively.

2. Prompt Injection Attacks

Static Analysis: Detected 12% of prompt injection risks by identifying configurations where untrusted input could reach agent prompts without sanitization. Runtime Analysis: Detected 71% of prompt injection attempts by evaluating agent behavior changes after receiving injected content. Winner: Runtime analysis. Prompt injection is inherently a runtime phenomenon.

3. Credential Exposure

Static Analysis: Detected 67% of credential exposure risks by identifying code paths where credentials could flow to agent outputs or logs. Runtime Analysis: Detected 83% of credential exposure by scanning actual agent outputs, tool call parameters, and log entries for credential patterns. Winner: Runtime analysis, with static analysis providing complementary coverage.

4. Scope Violations

Static Analysis: Detected 41% of potential scope violations by analyzing policy boundaries against tool capabilities. Runtime Analysis: Detected 92% of scope violations by comparing actual agent actions against defined boundaries. Winner: Runtime analysis. The gap between what an agent could do and what it does is only visible at runtime.

5. Resource Exhaustion

Static Analysis: Detected 23% of resource exhaustion risks by identifying unbounded loops, missing rate limits, and uncapped retry logic in configurations. Runtime Analysis: Detected 88% of resource exhaustion by monitoring actual resource consumption patterns. Winner: Runtime analysis.

6. Supply Chain Risks

Static Analysis: Detected 78% of supply chain risks by analyzing dependencies, verifying package integrity, and checking for known vulnerabilities. Runtime Analysis: Detected 31% of supply chain issues, primarily through behavioral anomaly detection when compromised dependencies executed. Winner: Static analysis. Supply chain analysis is fundamentally a pre-execution concern.

7. Data Exfiltration

Static Analysis: Detected 18% of exfiltration risks by identifying network access capabilities and overly permissive egress policies. Runtime Analysis: Detected 79% of exfiltration attempts by monitoring actual network traffic patterns and data flow. Winner: Runtime analysis.

Summary Table

| Threat Category | Static Detection | Runtime Detection | Recommended Approach |

|---|---|---|---|

| Policy Misconfiguration | 89% | 34% | Static primary |

| Prompt Injection | 12% | 71% | Runtime primary |

| Credential Exposure | 67% | 83% | Both |

| Scope Violations | 41% | 92% | Runtime primary |

| Resource Exhaustion | 23% | 88% | Runtime primary |

| Supply Chain | 78% | 31% | Static primary |

| Data Exfiltration | 18% | 79% | Runtime primary |

The Complementarity Argument

Our data strongly supports using both approaches together. When combined, static and runtime analysis achieved a 96% overall detection rate across all seven categories — significantly higher than either approach alone (47% for static, 68% for runtime).

The optimal workflow is:

  • Static analysis at deployment time to catch configuration errors, policy gaps, and supply chain risks before the agent runs
  • Runtime gating during execution to intercept dangerous actions as they occur
  • Runtime Gating in Practice

    Runtime action gating — intercepting every tool call and evaluating it against policies before execution — is the most effective single safety mechanism in our study. SafeClaw implements this approach, evaluating agent tool calls at runtime against deny-by-default policies. This addresses the five threat categories where runtime analysis excels while adding minimal latency (our measurements showed sub-millisecond policy evaluation for typical rule sets). Organizations can complement SafeClaw's runtime gating with static analysis of their policy configurations, which the knowledge base supports through documentation of policy patterns and common configuration pitfalls.

    Recommendations

  • Implement both static and runtime analysis — the approaches are complementary, not competing
  • Prioritize runtime gating if you must choose one — it covers more threat categories effectively
  • Run static analysis in CI/CD pipelines to catch policy misconfigurations before deployment
  • Measure detection rates for your specific threat model — the optimal balance varies by deployment
  • Accept the latency trade-off of runtime analysis — sub-millisecond policy evaluation is negligible compared to LLM inference time
  • Conclusion

    The runtime vs. static analysis debate is a false dichotomy. Both approaches have clear strengths for specific threat categories, and the combination provides dramatically better coverage than either alone. Organizations should invest in runtime gating as their primary safety mechanism while using static analysis as a complementary pre-deployment check.

    All detection rates were measured in controlled environments. Production detection rates may vary based on configuration and threat characteristics.