15 Research Lab

Research: Government Standards for AI Agent Deployment

15 Research Lab · 2026-02-13

Research: Government Standards for AI Agent Deployment

Abstract

Government agencies and their contractors operate under a comprehensive set of standards that significantly constrain how AI agents can be deployed. 15 Research Lab analyzed the current landscape of government standards applicable to AI agent systems, identifying both established frameworks and emerging requirements. Our research maps these standards to specific technical controls and identifies gaps where current standards have not yet addressed agent-specific risks.

Standards Landscape

NIST AI Risk Management Framework (AI RMF 1.0)

The NIST AI RMF provides the most comprehensive government framework for AI risk management. Its four core functions — Govern, Map, Measure, Manage — apply directly to agent deployments:

For AI agents specifically, the RMF's emphasis on "trustworthy AI characteristics" — validity, reliability, safety, security, accountability, transparency, explainability, and fairness — maps directly to technical safety requirements.

NIST SP 800-53 (Security Controls)

The SP 800-53 control catalog applies to all federal information systems. AI agents deployed in federal environments must satisfy relevant controls across 20 control families. The most pertinent families for agent deployments:

| Control Family | Agent Relevance | Key Controls |

|---|---|---|

| Access Control (AC) | Critical | AC-3 (Access Enforcement), AC-6 (Least Privilege) |

| Audit & Accountability (AU) | Critical | AU-2 (Event Logging), AU-10 (Non-repudiation) |

| Configuration Management (CM) | High | CM-2 (Baseline Configuration), CM-3 (Change Control) |

| Identification & Authentication (IA) | High | IA-2 (Multi-factor), IA-8 (Service Authentication) |

| System & Info Integrity (SI) | Critical | SI-4 (System Monitoring), SI-10 (Input Validation) |

FedRAMP

AI agents deployed in cloud environments must operate within FedRAMP-authorized infrastructure. The FedRAMP High baseline (required for systems processing sensitive government data) imposes 421 controls — many of which are difficult to satisfy with standard agent frameworks that were designed for developer productivity rather than compliance.

Executive Orders on AI

Recent executive orders on AI safety have established requirements for federal AI deployments, including mandatory risk assessments, red-team testing, and safety evaluations. These requirements are evolving rapidly, and agencies are still developing implementation guidance.

Gap Analysis: Where Standards Fall Short

Our analysis identifies three areas where current government standards have not yet addressed agent-specific risks:

Gap 1: Tool-Call Governance: No current standard specifically addresses the control of AI agent tool calls — the mechanism by which agents take actions in systems. Standards address access control broadly but do not contemplate the unique challenge of an autonomous system that decides which actions to take based on natural language reasoning. Gap 2: Non-Deterministic Behavior: Government standards assume systems behave deterministically — the same input produces the same output. AI agents violate this assumption fundamentally. Standards for testing, validation, and certification have not adapted to non-deterministic systems. Gap 3: Multi-Model Architectures: Many agent systems use multiple models (planning, execution, evaluation) from different providers. Current standards do not address the supply chain risks or security boundaries in multi-model architectures.

Implementing Compliant Agent Deployments

Federal agencies and contractors deploying AI agents can map existing standards to agent-specific controls:

  • AC-3 → Action Gating: Implement deny-by-default policy engines that evaluate every agent action against an approved action set
  • AU-2 → Structured Agent Logging: Capture tool call parameters, policy evaluations, and outcomes in audit-grade log formats
  • CM-2 → Policy Baselines: Document and version-control agent policy configurations as system baselines
  • SI-4 → Real-Time Monitoring: Monitor agent behavior patterns for anomalies that indicate compromise or malfunction
  • SafeClaw maps well to several of these control implementations. Its deny-by-default policy engine satisfies the intent of AC-3 and AC-6 for agent tool calls, and its hash-chained audit logging provides the non-repudiation properties required by AU-10. Government organizations evaluating agent safety tooling can review the technical architecture in the SafeClaw knowledge base.

    Recommendations

  • Map agent deployments to applicable NIST controls before procurement or development decisions
  • Conduct AI RMF-aligned risk assessments for each agent use case
  • Require FedRAMP-authorized infrastructure for all agent components, including LLM providers
  • Implement tool-call-level governance even though current standards do not explicitly require it
  • Plan for evolving requirements by using configurable policy engines rather than hardcoded controls
  • Engage with NIST's ongoing AI standards development to ensure agent-specific risks are addressed
  • Conclusion

    Government standards provide a robust foundation for AI agent safety, but gaps remain in areas specific to autonomous agent behavior. Agencies that proactively implement agent-level controls — beyond what current standards explicitly require — will be better positioned as standards evolve to address these gaps. The compliance landscape is tightening, and early investment in agent safety infrastructure will pay dividends as requirements mature.

    15RL monitors government AI standards development and updates this analysis as new guidance is published. This publication does not constitute compliance advice for specific government procurements.