- Developer accountability — responsible for the model's base behavior, safety training, and known failure modes. If the model is fundamentally unsafe this layer owns it.
- Operator accountability — the organization that deployed the agent, configured permissions, set task scope, and chose when to go live. Most enterprise incidents land here.
- User accountability — the person who initiated the task. Diminished if the system's behavior was opaque or the user had no reason to anticipate the failure.
- Causal accountability — harder with non-deterministic models but not impossible. You can audit inputs, context, tool calls, and outputs even without replaying internal reasoning.
- Structural accountability — entirely tractable. Did the operator test edge cases? Were guardrails in place? These are auditable facts.
Was there meaningful human control at the point things went wrong? Not “was a human in the loop?” but did the human have: sufficient information to understand what the agent was about to do, the ability to intervene before irreversible action, and a reasonable expectation the action was in scope? If any of those three are missing, accountability shifts to the operator and developer.
Someone will argue holding operators accountable chills AI adoption. The rebuttal: no clear accountability creates more risk aversion because organizations cannot price the liability. Clear frameworks enable deployment by making risk calculable — the same argument that made product liability law net-positive for innovation.
- EU AI Act — assigns obligations to deployers of high-risk systems: documentation, human oversight, incident reporting.
- NIST AI RMF — the Govern function addresses accountability structures explicitly.
- FTC — signaled it will hold deploying organizations accountable for AI-caused consumer harm under existing unfair/deceptive practices authority.
- [1]NIST AI Risk Management Frameworknist.gov
- [2]EU Artificial Intelligence Act (2024/1689)eur-lex.europa.eu
- [3]Meaningful Human Control — Article 36 Technical Reportarticle36.org
- [4]Anthropic Usage Policy — Operator and User Frameworkanthropic.com
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| Audit Logging | Immutable agent action logs | Azure Monitor Log Analytics | AWS CloudTrail (integrity validation) | Cloud Audit Logs (Data Access) | OpenTelemetry + Loki |
| LLM Observability | Trace reasoning and tool calls | Azure AI Foundry Tracing | Amazon Bedrock invocation logging | Vertex AI Experiments + logging | Langfuse / LangSmith / Arize Phoenix |
| Accountability Frameworks | Developer / Operator / User tiers | Microsoft Purview AI Hub | AWS AI Service Cards | Google Model Cards | MLflow + responsibility docs |
| Decision Records | Structured AI decision documentation | Azure ML Model Cards | Amazon SageMaker Model Cards | Vertex AI Model Registry metadata | MLflow / DVC |
| Explainability | AI decision explanation | Azure Responsible AI dashboard | Amazon SageMaker Clarify | Vertex Explainable AI | SHAP / LIME / AI Fairness 360 |
- Direct injection: malicious instructions in a webpage, PDF, or email the agent retrieves. Classic demo: white-on-white text saying “ignore previous instructions.”
- Indirect injection via search: attacker poisons a publicly accessible page knowing it will appear in the agent's results for a predictable query.
- Email-borne injection: demonstrated against early Copilot and AutoGPT systems — a malicious email instructs the inbox-processing agent to exfiltrate data.
- Passive injection — malicious content sitting in a data source waiting to be retrieved
- Active injection — attacker anticipates retrieval pattern and plants content specifically for that agent
- Persistent injection — content written into a memory store influencing future agent behavior across sessions
An attacker who can write to a shared knowledge base can plant content retrieved and treated as authoritative context. The injected content doesn't need to look like an instruction — framed as a “policy document” it gets incorporated naturally. Demonstrated against enterprise RAG systems with insufficiently permissioned write access.
If the attacker controls or compromises a tool endpoint, they return a response containing injected instructions alongside legitimate data. Demonstrated against LangChain configurations where tool output wasn't sanitized before re-ingestion.
A low-privilege agent is compromised then issues instructions to a higher-privilege orchestrating agent — laundering the attacker's intent through a trusted internal channel. Demonstrated conceptually against AutoGPT and multi-agent LangChain pipelines; increasingly a concern in production Copilot Studio environments.
- Crescendo (Microsoft AI Red Team): multi-turn manipulation where each step appears benign but cumulative effect is significant behavioral deviation. Hard to detect — no single turn looks anomalous.
- Skeleton Key: attacks that extract the agent's system prompt, enabling targeted follow-on injection that navigates precisely around known constraints.
- [1]OWASP Top 10 for LLM Applicationsowasp.org
- [2]MITRE ATLAS: Adversarial Threat Landscape for AI Systemsatlas.mitre.org
- [3]Not What You’ve Signed Up For: Indirect Prompt Injections (arXiv:2302.12173)arxiv.org
- [4]The Crescendo Multi-Turn Attack — Microsoft AI Red Team (arXiv:2404.01833)arxiv.org
- [5]Skeleton Key Jailbreak Technique — Microsoft Security Blogmicrosoft.com
- [6]The Confused Deputy Problem — Hardy (1988)en.wikipedia.org
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| Prompt Injection Defense | Prompt shield and input scanning | Azure AI Content Safety (Prompt Shields) | Amazon Bedrock Guardrails | Vertex AI Model Armor | LLM Guard / NeMo Guardrails / Rebuff |
| RAG / Memory Security | Vector store access control | Azure AI Search (RBAC + private endpoints) | Amazon Kendra + OpenSearch Security | Vertex AI Vector Search IAM | Weaviate (RBAC) / Qdrant |
| Tool Registry Integrity | Signed and verified tool catalog | Azure APIM (policy enforcement) | AWS API Gateway + Lambda authorizers | Apigee + Cloud Endpoints | Open Policy Agent |
| Red Teaming | Adversarial testing | Azure PyRIT | Amazon Bedrock model evaluation | Google DeepMind safety eval | Garak / PromptBench |
| Multi-Agent Trust | Signed inter-agent communication | Microsoft Entra Agent ID | AWS IAM role chaining with trust policies | Workload Identity Federation | SPIFFE / SPIRE |
| Output Filtering | Response scanning | Azure AI Content Safety | Amazon Comprehend + Bedrock Guardrails | Vertex AI Content Safety | Presidio / LLM Guard |
Unlike traditional software where a binary can be checksummed, AI model provenance is deeply opaque. A model's behavior is determined by its training data, training process, fine-tuning, RLHF feedback, and quantization — each step a potential point of tampering.
- Training data poisoning: if an attacker influences even a small percentage of the corpus they can embed backdoor behaviors that activate under specific trigger inputs. Demonstrated in academic research against both classification models and LLMs.
- Fine-tuning attacks: if the fine-tuning pipeline is compromised, or the base model already contains a backdoor, the resulting model may behave unpredictably in production.
- Quantization risks: converting a model to INT8 or INT4 is not a neutral operation — it changes behavior in ways difficult to characterize and can expose latent vulnerabilities.
- Model weights downloaded from public repositories have no cryptographic integrity guarantees by default.
- Malicious actors have already published poisoned model variants on public hubs that pass basic capability evals but contain embedded backdoors.
- Unlike closed API models where the provider maintains integrity, with open-weight deployments the enterprise is the last line of defense.
- AI Bill of Materials (AI-BOM): structured record of a model's lineage — base model, fine-tuning datasets, training infrastructure, known limitations. NIST is developing standards for this.
- Cryptographic model signing: hash-based integrity verification for model weights, analogous to code signing for software.
- Behavioral red-teaming at intake: before deploying any externally sourced model, run a structured battery of adversarial probes to detect backdoor triggers.
- Isolated eval environments: never run an unverified model in a production-connected environment.
Beyond the model itself, the agent framework, orchestration library (LangChain, AutoGen, CrewAI), vector database, and embedding model are all part of the supply chain. A compromised dependency at any layer can undermine the entire agentic system and most organizations have no visibility into this dependency graph.
- [1]NIST AI Risk Management Frameworknist.gov
- [2]CISA Software Bill of Materials (SBOM) Resourcescisa.gov
- [3]Backdoor Attacks and Defenses in ML — Goldblum et al. (arXiv:2012.10544)arxiv.org
- [4]The Update Framework (TUF) — Secure Software Updatestheupdateframework.io
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| Model Integrity / Signing | Cryptographic model verification | Azure ML (model signing via Key Vault) | Amazon SageMaker + AWS Signer | Google Artifact Registry (signed artifacts) | Sigstore / Cosign / TUF |
| AI Bill of Materials | Model lineage documentation | Microsoft Purview Data Lineage | AWS SageMaker Model Cards | Vertex AI Model Registry | CycloneDX AI BOM / SPDX |
| Dependency Scanning | Framework and library vulnerability scanning | Microsoft Defender for DevOps | Amazon Inspector + CodeGuru | Google Artifact Analysis | Trivy / Grype / Dependabot |
| Isolated Model Evaluation | Sandboxed intake testing | Azure ML compute clusters (isolated) | Amazon SageMaker Studio (isolated domain) | Vertex AI Workbench (VPC-SC) | Podman / gVisor sandboxing |
| Backdoor / Trojan Detection | Model behavioral scanning | Azure Responsible AI dashboard | Amazon SageMaker Clarify | Vertex Explainable AI | TrojanZoo / IBM ART |
- HIPAA — requires audit trails for every PHI access. An agent that incidentally reads protected health information while completing a task creates access events that may not be captured by traditional audit infrastructure.
- SOX — requires controls over financial reporting. If an AI agent participates in financial data processing, the control framework must demonstrate bounded, tested, auditable behavior.
- NERC CIP — critical infrastructure standards assume manual change management. An agent that autonomously modifies OT environment configurations is a risk category NERC CIP was not designed to address.
- GDPR / CCPA — the right to explanation for automated decisions is structurally difficult when the explanation involves a transformer attention mechanism.
- Clinical decision support agents influencing care pathways create liability exposure when acting on stale, hallucinated, or out-of-distribution data.
- PHI exfiltration risk is amplified when agents have broad EHR access — a single prompt injection could cause an agent to exfiltrate patient records.
- FDA is developing guidance for AI/ML-based Software as a Medical Device (SaMD) — agentic systems in clinical settings will likely require pre-market review pathways.
- Agents with trading or transaction authority create market manipulation risk through adversarial manipulation or emergent behavior at scale.
- Model risk management (SR 11-7 guidance) requires validation, ongoing monitoring, and independent review — standards that do not map cleanly onto foundation model deployments.
- AML and KYC processes incorporating AI agents must demonstrate to regulators that decisions are explainable and human oversight is genuine, not performative.
- Tiered data access: agents should never have standing access to the full regulated data environment. Access granted per-task, per-data-class, with explicit justification.
- Deterministic audit wrappers: even if the agent's internal reasoning is non-deterministic, every externally visible action can be logged in a tamper-evident audit record.
- Human-in-the-loop at regulatory boundaries: any agent action crossing a regulatory threshold should require explicit human approval, not just logging.
- [1]GDPR Article 22 — Automated Individual Decision-Makinggdpr.eu
- [2]HIPAA Security Rule — Audit Controls (HHS)hhs.gov
- [3]PCAOB AS 2201 — IT General Controlspcaobus.org
- [4]SR 11-7: Model Risk Management Guidance — Federal Reservefederalreserve.gov
- [5]NERC Critical Infrastructure Protection Standardsnerc.com
- [6]FDA AI/ML Software as a Medical Device Action Planfda.gov
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| Healthcare (HIPAA) | HIPAA-compliant AI infrastructure | Azure Health Data Services (HIPAA BAA) | AWS HIPAA-eligible services | Google Cloud Healthcare API | OpenMRS / HAPI FHIR (on-prem) |
| Financial (SR 11-7) | Model risk management validation | Azure ML (model validation workflows) | Amazon SageMaker Model Monitor | Vertex AI Model Evaluation | MLflow + Great Expectations |
| Immutable Audit Trail | Tamper-evident logging for regulators | Azure Immutable Blob Storage + Monitor | AWS CloudTrail + S3 Object Lock | Cloud Audit Logs + GCS WORM | Hyperledger Fabric (audit chain) |
| Data Residency and Sovereignty | Geographic data controls | Azure Sovereign Clouds (Gov, China) | AWS GovCloud + data residency controls | Google Assured Workloads | MinIO (on-prem) / Nextcloud |
| Compliance Reporting | Automated evidence collection | Microsoft Purview Compliance Manager | AWS Audit Manager | Google Security Command Center | OpenSCAP / Prowler / Steampipe |
| Critical Infrastructure (NERC CIP) | OT/ICS AI access controls | Microsoft Defender for IoT | AWS IoT Greengrass + GuardDuty | Google Distributed Cloud Edge | Claroty / Dragos community resources |
- Traditional exfiltration requires a human actor to deliberately seek out and move data. An agent can be instructed to exfiltrate through a prompt injection with no human intent involved.
- Agents accumulate context across tool calls — an agent tasked with summarizing emails may incidentally read sensitive financial or strategic content that was never the intended target.
- The blast radius of a single compromised agent session can be enormous: an enterprise agent with broad CRM, email, and file system access touches more sensitive data in one task than a typical employee accesses in a week.
- Output exfiltration: an injected instruction causes the agent to include sensitive data in its visible output, which may then be sent to an attacker-controlled destination.
- Covert channel via tool calls: an agent makes API calls to an external service using sensitive data as parameters — disguising exfiltration as legitimate tool use.
- Steganographic exfiltration: demonstrated in research — sensitive data encoded into seemingly innocuous agent output that can be decoded by an attacker-controlled receiver.
- Memory store exfiltration: an agent writes sensitive data to a shared memory store the attacker has read access to, without triggering traditional DLP controls.
- Output inspection: apply DLP-style pattern matching to agent outputs before they are returned to users or written to external systems.
- Tool call parameter auditing: log and inspect the parameters of every outbound tool call, not just the fact that a call was made.
- Context scoping: limit what data enters the agent's context window. Retrieval systems should enforce need-to-know at the retrieval layer.
- Egress filtering: agent infrastructure should have explicit egress allowlists — outbound calls only to approved endpoints with anomaly detection for new destinations.
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| DLP for Agent Outputs | Output content inspection | Microsoft Purview DLP + AI Content Safety | Amazon Macie + Bedrock Guardrails | Google DLP API + Vertex Content Safety | Presidio (Microsoft OSS) / LLM Guard |
| Egress Filtering | Outbound network allowlisting | Azure Firewall + Private Endpoints | AWS Network Firewall + VPC Endpoints | Google VPC Service Controls + Cloud Armor | Cilium / OPA network policy |
| Tool Call Auditing | Log all outbound API parameters | Azure APIM + Monitor (request logging) | AWS API Gateway Access Logging + CloudTrail | Apigee Analytics + Cloud Audit Logs | OpenTelemetry / Tyk |
| Context Scoping / Need-to-Know | Retrieval-layer access control | Azure AI Search (security trimming) | Amazon Kendra (ACL-based retrieval) | Vertex AI Search (IAM-based) | Weaviate RBAC / Qdrant filtering |
| Secrets Detection | Scan outputs for leaked credentials | Microsoft Defender for Cloud (secret scanning) | Amazon CodeGuru Reviewer + Macie | Google Cloud Secret Manager + DLP | TruffleHog / Gitleaks / Detect-secrets |
In multi-agent systems, delegation semantics are poorly defined:
- Does a sub-agent inherit the full permissions of the orchestrating agent, a subset, or only task-specific permissions?
- Can a sub-agent further delegate? To whom? With what constraints?
- If a sub-agent's action is harmful, is the orchestrating agent that spawned it accountable?
None of these questions have standardized answers in current agentic frameworks.
Trust should attenuate, not amplify, at every delegation hop.
- An orchestrating agent should only be able to grant a sub-agent a subset of its own permissions — never more.
- Each delegation hop should carry a signed context header proving the chain of authority: who originated the task, what permissions were granted at each step, what constraints apply.
- Sub-agents should verify the legitimacy of instructions they receive — not just trust that because a message arrived in the right channel it must be legitimate.
- Instruction provenance checking: before executing an instruction from another agent, verify that the instruction source has the authority to issue it.
- Capability tokens: rather than inheriting ambient permissions, sub-agents receive explicit capability tokens for specific actions they are authorized to take on the current task.
- Anomaly detection on agent-to-agent traffic: instructions that deviate from established patterns or request permissions outside the task envelope should trigger alerts.
When an incident occurs in a multi-agent system, reconstructing the instruction chain is essential for accountability. This requires every agent-to-agent instruction to be logged with full provenance — source identity, timestamp, task context, and permissions granted. Without this, incident investigation becomes intractable.
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| Agent Identity and Attestation | Workload identity per agent | Microsoft Entra Agent ID | AWS IAM Roles + OIDC (per agent) | Workload Identity Federation | SPIFFE / SPIRE |
| Capability Tokens / Scoped Auth | Task-scoped token issuance | Microsoft Entra (scoped OAuth tokens) | AWS STS AssumeRole (scoped) | Google IAM service account impersonation | HashiCorp Vault (AppRole / entity tokens) |
| Agent-to-Agent Communication | Signed inter-agent messages | Azure Service Bus + Entra managed identity | Amazon SQS + IAM signed requests | Google Pub/Sub + Workload Identity | SPIFFE SVID + mTLS / NATS with JWTs |
| Delegation Policy Enforcement | Attenuation at each hop | Azure APIM policies + Entra token claims | AWS IAM permission boundaries | Google IAM conditions + org policy | Open Policy Agent (delegation rules) |
| Multi-Agent Orchestration | Agent workflow with trust controls | Azure AI Foundry Agent Service | Amazon Bedrock Multi-Agent Collaboration | Vertex AI Agent Builder | LangGraph / AutoGen / CrewAI |
Most production LLM systems have an implicit instruction hierarchy: system prompt > user message > tool output. Security posture depends on how strictly this hierarchy is enforced:
- A model that allows user messages to override system prompt constraints is fundamentally insecure for agentic deployments.
- A model that treats tool output as having the same trust level as system prompt instructions is vulnerable to tool-based injection.
- Explicit, well-defined hierarchy — and a model that reliably respects it — is a prerequisite for production-grade agent security.
- Explicit scope definition: state clearly what the agent is and is not permitted to do. Vague permissions are exploitable; specific permissions are defensible.
- Explicit denial of override: include instructions that the system prompt cannot be overridden by user messages or externally retrieved content. This raises the bar for injection attacks.
- Data-instruction separation: use structural markers (XML tags, delimiters) to distinguish between content to process as data versus instructions to follow.
- Minimal capability declaration: only describe capabilities the agent actually needs. Enumerating broad capabilities the agent could theoretically use invites an attacker to invoke them.
System prompts should be treated as security-critical configuration artifacts — versioned, reviewed, and subject to change management. A prompt change that expands agent permissions should require the same scrutiny as a firewall rule change. In practice, most organizations have no formal process for prompt review or approval.
In multi-agent systems, orchestrating agents often generate system prompts for sub-agents dynamically. If an attacker can influence what the orchestrating agent writes into a sub-agent's system prompt, they can shape the sub-agent's security posture. Dynamic prompt generation should be treated with the same rigor as any code that touches security boundaries.
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| Prompt Management and Versioning | System prompt version control | Azure AI Foundry (prompt management) | Amazon Bedrock Prompt Management | Vertex AI Prompt Optimizer | Langfuse / PromptFlow / Promptly |
| Prompt Injection Hardening | Input sanitization and content safety | Azure AI Content Safety (Prompt Shields) | Amazon Bedrock Guardrails | Vertex AI Model Armor | LLM Guard / NeMo Guardrails |
| Instruction Hierarchy Enforcement | System > user > tool trust levels | Azure OpenAI system prompt controls | Amazon Bedrock system prompt config | Gemini system instruction API | LangChain (message type hierarchy) |
| Meta-Prompt Security | Secure dynamic prompt generation | Azure APIM (prompt templating + validation) | AWS Step Functions (prompt pipeline) | Vertex AI Pipelines | Guidance (Microsoft OSS) / LMQL |
| Prompt Audit Logging | Log all prompts as config artifacts | Azure Monitor + AI Foundry logging | Amazon Bedrock invocation logging | Vertex AI logging | Langfuse / MLflow prompt tracking |
- Traditional SIEM rules are pattern-based — they look for specific sequences of events. Agent behavior produces variable event sequences for the same logical task, making static rules ineffective.
- The signal-to-noise problem is acute: agents generate high volumes of tool calls, API requests, and data accesses that are indistinguishable from legitimate behavior without semantic context.
- Traditional UEBA assumes a stable behavioral baseline. Agents do not have one — their behavior varies with every task and context window.
- Task-conditioned baselines: maintain baselines per task type, not per agent. An email summarization agent has a different expected behavior profile than a data analysis agent, even if they are the same underlying model.
- Trajectory analysis: analyze the sequence and composition of tool calls across a task. A task that starts with reading emails and ends with making outbound API calls to an unknown endpoint is anomalous regardless of whether each individual action looks benign.
- Privilege usage analytics: track which permissions each agent actually uses versus which it holds. Sudden use of previously unused permissions warrants investigation.
- Every tool call: name, parameters, response, latency, success/failure
- Data access events: what was read, at what sensitivity level, in what task context
- Output events: what was written, to where, of what size and type
- Agent spawning events: what sub-agents were created, with what permissions
- External network calls: destination, protocol, payload size, frequency
- Exfiltration volume anomalies: agent reads significantly more data than it writes output — a signal that data may be accumulating in context for extraction.
- Permission boundary probing: multiple failed permission checks in rapid succession suggests something is exploring the permission landscape.
- Instruction-action mismatch: agent's actions are inconsistent with its declared task — increasingly tractable with LLM-based detection.
- Novel endpoint calls: agent makes calls to endpoints not seen in its task history — high-fidelity signal when combined with egress allowlisting.
- [1]User and Entity Behavior Analytics — NIST Glossarycsrc.nist.gov
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| LLM / Agent Tracing | End-to-end agent call tracing | Azure AI Foundry Tracing + App Insights | Amazon Bedrock CloudWatch + X-Ray | Vertex AI Experiments + Cloud Trace | LangSmith / Langfuse / Arize Phoenix |
| Behavioral Anomaly Detection | Task-conditioned behavior monitoring | Microsoft Sentinel UEBA + AI analytics | Amazon GuardDuty ML + Detective | Google Security Operations (UEBA) | Elastic SIEM / Falco / Prometheus |
| SIEM Integration | Agent events to SIEM pipeline | Microsoft Sentinel (native connectors) | Amazon Security Lake + OpenSearch | Google Chronicle SIEM | OpenSearch Security Analytics / Wazuh |
| Metrics and Dashboards | Agent performance and security dashboards | Azure Monitor Dashboards + Workbooks | Amazon CloudWatch Dashboards | Google Cloud Monitoring + Looker | Grafana + Prometheus |
| Privilege Usage Analytics | Track permissions used vs granted | Microsoft Entra Access Reviews | AWS IAM Access Analyzer | Google IAM Recommender | Cloudsplaining / Cartography |
- Traditional pentests target deterministic systems. Agentic AI requires probabilistic testing — an attack that fails 9 times out of 10 is still viable if the consequences of the 10th success are severe enough.
- The attack surface is dynamic: an agent's effective attack surface changes with every task, context window, and set of tools it is given access to.
- Traditional test scoping does not translate cleanly when the application can spawn sub-agents, call arbitrary APIs, and read from data sources not known at test design time.
An AI-specific threat model should enumerate:
- Trust boundaries: where does the agent receive inputs from untrusted sources? What are all the data channels into the agent's context window?
- Action inventory: what is every action the agent can take? What are the worst-case consequences of each?
- Delegation graph: what agents can this agent spawn or instruct? What agents can instruct this agent?
- Data sensitivity map: what is the most sensitive data the agent can access, and what are the paths by which it could be extracted?
- Injection corpus testing: build a library of injection payloads and test them systematically against every data input channel.
- Permission boundary testing: attempt to get the agent to take actions outside its defined scope through instruction, social engineering, and multi-step manipulation.
- Multi-turn adversarial scenarios: test for Crescendo-style attacks — sequences of individually benign interactions that cumulatively steer toward harmful behavior.
- Sub-agent injection: in multi-agent systems, test whether a compromised sub-agent can influence the behavior of the orchestrating agent.
- Exfiltration path testing: attempt to cause the agent to include sensitive data in outputs, tool call parameters, or written artifacts using all available injection vectors.
- [1]OWASP Top 10 for LLM Applicationsowasp.org
- [2]MITRE ATLAS: Adversarial Threat Landscape for AI Systemsatlas.mitre.org
- [3]The Crescendo Multi-Turn Attack — Microsoft AI Red Team (arXiv:2404.01833)arxiv.org
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| AI Red Teaming | Automated adversarial testing | Azure PyRIT | Amazon Bedrock model evaluation | Google DeepMind safety evaluation | Garak / PromptBench / PyRIT (OSS) |
| Threat Modeling | AI-specific threat modeling | Microsoft Threat Modeling Tool | AWS Threat Composer | Google Cloud Security threat modeling | MITRE ATLAS / OWASP Threat Dragon |
| Injection Corpus Testing | Systematic prompt injection testing | Azure AI Content Safety evaluation | Amazon Bedrock prompt evaluation | Vertex AI evaluation SDK | Garak / PromptInject / HarmBench |
| Multi-Turn Adversarial Testing | Crescendo-style scenario testing | Azure PyRIT (multi-turn orchestration) | Amazon Bedrock automated red teaming | Google AI Safety red team tools | Garak (multi-turn probes) / CyberSecEval |
| Compliance Benchmark Testing | Safety benchmark evaluation | Azure AI Foundry evaluations | Amazon Bedrock model evaluation jobs | Vertex AI model eval (safety metrics) | EleutherAI LM Eval Harness / HELM |
- A traditional insider threat is bounded by the individual's own access rights and the manual effort required to exfiltrate data. An agent multiplies both: the agent may have broader access than the user, and the user can instruct the agent to perform in minutes what would take hours manually.
- The agent acts as a force multiplier and an abstraction layer simultaneously — the insider's actions are mediated by the agent, making attribution harder.
- Plausible deniability: “I didn't do that, the agent did” becomes a defense that may be difficult to refute without detailed instruction logs.
- Instruction laundering: a malicious insider crafts prompts that instruct the agent to perform actions the insider could not do directly — bypassing access controls that do not apply to the agent.
- Scope creep exploitation: agents operating with broad permissions can be instructed to access data far outside the nominal task scope.
- Timing attacks: instructing an agent to perform sensitive actions at times when oversight is reduced — after hours, during high-volume periods when alerts are more likely to be buried.
- Credential harvesting via agent: instructing an agent to access systems and capture credentials or session tokens in its output, which the insider then uses for direct access.
- Instruction auditing: the human's instructions to the agent must be logged with the same rigor as the agent's actions. Without this, you can see what the agent did but not why.
- Behavioral correlation: correlate agent behavior with the instructing user's historical patterns. An agent suddenly accessing data categories the user has never previously touched is a signal worth investigating.
- Dual-approval for high-sensitivity tasks: tasks involving highly sensitive data should require a second human to authorize the agent's instructions, not just the agent's actions.
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| Instruction Auditing | Log human to agent instructions | Azure Monitor + AI Foundry logging | Amazon Bedrock invocation logs + CloudTrail | Vertex AI logging + Cloud Audit Logs | Langfuse / OpenTelemetry |
| User Behavior Analytics | Correlate user and agent behavior | Microsoft Sentinel UEBA | Amazon GuardDuty + Detective | Google Security Operations (UEBA) | Elastic SIEM / Wazuh |
| Privileged Access Management | Control what users can instruct agents to do | Microsoft Entra PIM | AWS IAM + Access Analyzer | Google BeyondCorp PAM | CyberArk Conjur (OSS) / HashiCorp Vault |
| Data Loss Prevention | Detect sensitive data in agent instructions | Microsoft Purview DLP | Amazon Macie + Comprehend | Google DLP API | Presidio / OpenDLP |
| Dual Approval Workflows | Multi-person auth for sensitive agent tasks | Microsoft Entra Verified ID + Approvals | AWS IAM MFA conditions + Access Approval | Google Access Approval | OPA (multi-approver policy) / Teleport |
When two organizations' agents interact, they bring different:
- Security policies and guardrails that may be incompatible or in conflict
- Data classification schemes — what one organization considers public another may consider confidential
- Permission models — an agent authorized by Organization A may be performing actions on systems owned by Organization B
- Incident response capabilities — if something goes wrong, which organization's IR team responds, and do they have visibility into the other's agent behavior?
- Existing SLAs and data processing agreements were not written with autonomous agent interaction in mind. An agent that calls a partner's API and causes an incident falls outside most contractual frameworks for fault allocation.
- The question of who is responsible when Agent A (Org A) instructs Agent B (Org B) to take an action that causes harm is genuinely unresolved in current legal frameworks.
- Data residency and sovereignty requirements may be violated when agents route data through cross-organizational interactions not anticipated in the original compliance architecture.
- Agent API contracts: formal specifications of what actions a partner's agent is permitted to request, what data it will receive, and what audit information will be shared in case of incidents.
- Mutual attestation: before an agent-to-agent interaction, both agents exchange signed attestations of their current permission scope, security posture, and data handling commitments.
- Federated audit logs: cross-organizational agent interactions should produce audit records that both organizations can access, enabling joint incident investigation.
- Sandboxed interaction zones: cross-org agent interactions should occur in isolated environments that prevent the partner agent from accessing internal systems not explicitly shared.
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| Cross-Org Agent Authentication | Inter-organization workload identity | Microsoft Entra B2B + Workload Identity | AWS IAM cross-account + OIDC | Workload Identity Federation (cross-org) | SPIFFE Federation / SPIRE |
| API Security and Contracts | Agent API access control | Azure APIM (cross-tenant policies) | AWS API Gateway + resource-based policies | Apigee (cross-org API management) | Kong Gateway / Tyk |
| Mutual Attestation | Runtime security posture exchange | Microsoft Entra Verified ID | AWS Artifact + OIDC attestation | Google Assured Workloads attestation | SPIFFE SVID + mTLS |
| Federated Audit Logs | Shared cross-org audit trail | Azure Monitor (cross-tenant queries) | AWS CloudTrail + S3 cross-account | Cloud Audit Logs + Log Sink (cross-project) | OpenTelemetry (federated collector) |
| Data Classification and Sharing | Controlled data exchange | Microsoft Purview (cross-tenant sharing) | AWS Lake Formation cross-account | Google Analytics Hub | Apache Atlas / OpenMetadata |
- An agent is deployed for a specific project. The project ends but the agent — along with its credentials, permissions, and integrations — is never formally decommissioned.
- Team turnover: the person who deployed the agent leaves, and no one else has visibility into its existence or operation.
- Shadow AI deployment: an individual team deploys an agent without formal IT or security involvement. When that individual leaves, the agent becomes invisible to governance processes.
- Automated agent spawning: in complex multi-agent systems, sub-agents may be created dynamically and persist beyond the lifecycle of the task that created them.
- Stale credentials: ghost agents often hold long-lived credentials that were never rotated. If those credentials are compromised, the attacker gains access with no one actively monitoring the agent's behavior.
- Outdated models: a ghost agent running an older model version may lack safety mitigations introduced in subsequent versions, making it more vulnerable to known attack patterns.
- Unpatched dependencies: the orchestration libraries, tool integrations, and supporting infrastructure of a ghost agent accumulate security debt with no one maintaining them.
- Invisible blast radius: if a ghost agent is compromised, the organization has no incident response playbook for an agent it did not know existed.
- Agent registry as a control: every deployed agent must be registered with owner, creation date, purpose, permissions, model version, and last-reviewed date. Unregistered agents are treated as unauthorized.
- Mandatory expiration: agent deployments should have explicit expiration dates requiring active renewal. Default-to-expired is safer than default-to-permanent.
- Credential TTL enforcement: agent credentials should have short TTLs enforced at the infrastructure level. An agent that cannot refresh its credentials automatically decommissions itself.
- Periodic access reviews: include agent access in the same periodic access review processes applied to human users.
| Area / Topic | Name | Azure | AWS | Open Source | |
|---|---|---|---|---|---|
| Agent Inventory / Registry | Catalog all deployed agents | Azure AI Foundry (model/agent catalog) | Amazon SageMaker Model Registry | Vertex AI Model Registry | MLflow Registry / Backstage |
| Credential TTL Enforcement | Short-lived auto-expiring credentials | Microsoft Entra (token lifetime policies) | AWS STS session duration limits | Google IAM (short-lived service account keys) | HashiCorp Vault (TTL leases) |
| Access Reviews | Periodic agent permission reviews | Microsoft Entra Access Reviews | AWS IAM Access Analyzer + Access Advisor | Google IAM Recommender | Cartography / Cloudsplaining |
| Shadow AI Discovery | Detect unauthorized agent deployments | Microsoft Defender for Cloud Apps | Amazon Macie + Config | Google Security Command Center | Prowler / Steampipe |
| Lifecycle Automation | Auto-decommission on expiry | Azure Policy (auto-remediation) | AWS Config Rules + Lambda remediation | Google Cloud Asset Inventory + Policy | Ansible / Terraform lifecycle rules |