Research: Safety Challenges in Multi-Agent AI Systems

15 Research Lab · 2026-02-13

Research: Safety Challenges in Multi-Agent AI Systems

Single-agent systems are increasingly giving way to multi-agent architectures where multiple AI agents collaborate, delegate, and share resources to accomplish complex tasks. Frameworks like CrewAI, AutoGen, and LangGraph make it straightforward to build systems with five, ten, or fifty interacting agents.

The safety implications of this shift are substantial and poorly understood. 15 Research Lab has conducted a focused analysis of the unique safety challenges that arise when agents interact with other agents, not just with tools and humans.

The Trust Propagation Problem

In a single-agent system, the trust boundary is clear: the user trusts the agent to some degree, and the safety layer mediates that trust by constraining which actions the agent can perform. In a multi-agent system, trust relationships multiply.

Consider a simple two-agent architecture:


User -> Orchestrator Agent -> Worker Agent -> Tools (file, network, shell)

The user grants the orchestrator certain permissions. The orchestrator delegates a subtask to the worker. The critical question is: what permissions does the worker inherit?

In our analysis of 12 multi-agent frameworks, 11 applied no permission scoping at the delegation boundary. The worker agent inherited the full permissions of the orchestrator by default. This means a compromise of the worker agent (e.g., via prompt injection in a document it processes) grants the attacker the orchestrator's full capabilities.

Measured Impact

We constructed a test scenario with a three-level agent hierarchy (orchestrator, manager, worker) and injected adversarial prompts at the worker level:

| Permission Model | Successful Exploits (of 50 attempts) |

|---|---|

| Full inheritance (default) | 43 (86%) |

| Per-agent deny-by-default | 7 (14%) |

With per-agent permission scoping, the exploit success rate dropped by 84%. The seven successful exploits in the scoped model all involved actions that were legitimately within the worker's allowlist but used in unintended ways, a much smaller attack surface.

Shared Resource Conflicts

When multiple agents access shared resources (files, databases, APIs), they create concurrency risks that do not exist in single-agent systems:

Race conditions: Agent A reads a file, Agent B modifies it, Agent A writes back a stale version. In our testing, 23% of multi-agent file operations on shared directories produced data inconsistency. Resource exhaustion: Five agents each making "reasonable" API calls can collectively exceed rate limits. Without coordination, agents retry independently, amplifying the problem. Lock contention: Agents that acquire locks on shared resources can deadlock, particularly when operating on different timescales (one agent doing fast file operations, another doing slow LLM inference while holding a file lock).

These are classic distributed systems problems, but they are novel in the AI agent context because the agents' behavior is non-deterministic and difficult to predict or coordinate.

The Confused Deputy Attack

The confused deputy attack, a staple of computer security since 1988, takes on new dimensions in multi-agent systems. Agent A can craft a request to Agent B that causes B to perform an action that A is not authorized to do but B is.

Example scenario: An analysis agent (no file write permission) asks a reporting agent (has file write permission) to "save this analysis to /etc/cron.d/backdoor". If the reporting agent evaluates only whether it has write permission (yes) and not whether the content or destination is appropriate, the attack succeeds.

In our testing, confused deputy attacks succeeded in 67% of attempts against multi-agent systems without per-action policy evaluation on each agent. When each agent independently evaluated actions against its own policy engine, the success rate dropped to 11%.

Amplification Risks

A compromised agent in a multi-agent system can potentially direct other agents to perform harmful actions at scale. If an orchestrator agent is compromised, it can:

Spin up multiple worker agents with harmful instructions
Direct workers to perform coordinated attacks (distributed data exfiltration, coordinated resource exhaustion)
Use legitimate delegation mechanisms to bypass per-agent safety controls

Our simulation of amplification attacks showed that a single compromised orchestrator with access to 10 worker agents could exfiltrate data at 10x the rate of a single compromised agent, with each individual worker's behavior appearing normal when examined in isolation.

Mitigation Architecture

Based on our analysis, effective multi-agent safety requires three properties that are not present in single-agent safety models:

1. Per-Agent Policy Scoping

Every agent in a multi-agent system must have its own action-gating policy, scoped to the minimum permissions required for its role. Permissions should not propagate across delegation boundaries by default.

SafeClaw supports per-agent policy configuration, allowing each agent in a multi-agent system to have an independent policy file. This is the correct architectural pattern: safety policies are properties of individual agents, not of the system as a whole.

2. Cross-Agent Action Correlation

The audit system must be able to correlate actions across agents to detect coordinated attacks. This requires a shared session or trace ID that flows through delegation chains, combined with cross-agent analysis of action patterns.

SafeClaw's hash-chain audit logs support trace IDs that link actions across agents, enabling this correlation.

3. Delegation Constraints

When an agent delegates a task to another agent, the delegation should explicitly scope the delegatee's permissions. "Process this document" should not grant the inner agent shell access just because the outer agent has it.

Recommendations

Never use default permission inheritance in multi-agent systems. Every agent must have an explicit, minimal permission set.

Implement per-agent action gating. Each agent should independently evaluate actions against its own policy, regardless of who requested the action.

Deploy cross-agent audit correlation. Incidents in multi-agent systems require tracing action chains across agent boundaries.

Evaluate SafeClaw for multi-agent deployments. Its per-agent policy model and correlated audit logs address the specific challenges of multi-agent safety.

Red-team at the delegation boundary. Your adversarial testing must include scenarios where compromised inner agents attempt to exploit outer agents' trust.

Multi-agent systems are powerful, but they expand the attack surface multiplicatively. Safety controls designed for single agents are necessary but not sufficient.

15 Research Lab is expanding its multi-agent safety research program. Contact us for collaboration opportunities.