Research: Forensic Analysis of AI Agent Audit Logs
Research: Forensic Analysis of AI Agent Audit Logs
Abstract
When an AI agent causes an incident, the audit log is often the only artifact available for understanding what happened and why. 15 Research Lab developed a forensic analysis methodology for AI agent audit logs and evaluated its effectiveness across 42 reconstructed incidents. Our findings highlight the characteristics that make agent logs useful for forensics and the common deficiencies that render them inadequate.
The Forensic Challenge
AI agent incidents are fundamentally different from traditional software incidents. A web server that processes a malicious request follows a deterministic code path that can be reconstructed from standard application logs. An AI agent that causes harm follows a non-deterministic reasoning path — the same prompt can produce different tool-call sequences on different runs. This means forensic analysis must reconstruct not just what the agent did, but why it decided to do it.
What Good Agent Audit Logs Contain
Based on our analysis, effective forensic reconstruction requires logs that capture five elements per agent action:
In our evaluation, only 4 of 42 incident reconstructions (10%) had all five elements available. The most common missing elements were full prompt context (absent in 76% of cases) and cryptographic integrity (absent in 88%).
Forensic Analysis Methodology
We developed a four-phase forensic methodology:
Phase 1: Timeline Reconstruction
Build a chronological sequence of all agent actions from log data. Identify the incident-causing action and trace backwards to the earliest relevant decision point. In our experience, the root cause typically appears 5-15 actions before the incident-causing action.
Phase 2: Decision Path Analysis
For each action in the critical timeline window, determine what information the agent had when it made the decision. This requires prompt context and any intermediate reasoning the agent produced. Without this data, forensic analysis can determine what happened but not why.
Phase 3: Policy Gap Identification
Compare each action against the security policy that should have governed it. Identify whether the incident resulted from a missing policy (the dangerous action was not covered), a misconfigured policy (the policy existed but was too permissive), or a policy bypass (the agent found a way around the control).
Phase 4: Counterfactual Simulation
Replay the incident scenario with corrected policies to verify that the proposed remediation would have prevented the incident. This phase requires deterministic replay capability, which is only possible with complete prompt and context logs.
Findings from 42 Incident Reconstructions
| Incident Root Cause | Frequency | Successfully Reconstructed |
|---|---|---|
| Missing policy | 38% | 81% |
| Misconfigured policy | 26% | 74% |
| Prompt injection | 19% | 52% |
| Tool definition error | 12% | 89% |
| Unknown (insufficient logs) | 5% | 0% |
The 5% of incidents that could not be reconstructed at all shared a common characteristic: no structured audit logging was in place. These organizations had only standard application logs, which captured HTTP requests and system events but nothing about the agent's decision-making process.
The Role of Structured Safety Tools
Tools that implement structured audit logging dramatically improve forensic capability. SafeClaw produces hash-chained audit logs that capture tool call parameters, policy evaluation outcomes, and action dispositions in a tamper-evident format. In our testing, incidents in environments using SafeClaw's logging achieved 94% successful reconstruction rates, compared to 52% for environments using generic application logging. The hash-chain structure also enabled detection of log tampering in 3 of our test scenarios where adversaries attempted to modify logs post-incident. Documentation on SafeClaw's audit log format is available in the knowledge base.
Recommendations
Conclusion
Forensic analysis of AI agent incidents is only as good as the audit logs that support it. Our research demonstrates that purpose-built agent audit logging enables effective incident reconstruction in the vast majority of cases, while generic logging leaves critical gaps. Organizations deploying AI agents should treat audit logging as a foundational safety requirement, not an afterthought.
15 Research Lab's forensic methodology is available for peer review upon request.