CIO/CISO Los Angeles Think Tank, May 19, 2026
Securing the Rise of Agentic AI
0 / 10
Deep Dive · Q3
Accountability When an AI Agent Does Something Wrong
Accountability has to be assigned, not discovered. Traditional chains work because there is a person at the end who made a choice. With agentic AI that chain gets murky, so the framework has to change.
The three-layer accountability model
  • Developer accountability — responsible for the model's base behavior, safety training, and known failure modes. If the model is fundamentally unsafe this layer owns it.
  • Operator accountability — the organization that deployed the agent, configured permissions, set task scope, and chose when to go live. Most enterprise incidents land here.
  • User accountability — the person who initiated the task. Diminished if the system's behavior was opaque or the user had no reason to anticipate the failure.
The non-determinism problem
  • Causal accountability — harder with non-deterministic models but not impossible. You can audit inputs, context, tool calls, and outputs even without replaying internal reasoning.
  • Structural accountability — entirely tractable. Did the operator test edge cases? Were guardrails in place? These are auditable facts.
Analogy worth using on panel
We don't require a pilot to explain every micro-adjustment before determining negligence. We look at the flight data recorder, procedures followed, and whether the system was airworthy. Same logic applies to AI agents.
The meaningful human control test

Was there meaningful human control at the point things went wrong? Not “was a human in the loop?” but did the human have: sufficient information to understand what the agent was about to do, the ability to intervene before irreversible action, and a reasonable expectation the action was in scope? If any of those three are missing, accountability shifts to the operator and developer.

Anticipate this counterpoint

Someone will argue holding operators accountable chills AI adoption. The rebuttal: no clear accountability creates more risk aversion because organizations cannot price the liability. Clear frameworks enable deployment by making risk calculable — the same argument that made product liability law net-positive for innovation.

Regulatory horizon
  • EU AI Act — assigns obligations to deployers of high-risk systems: documentation, human oversight, incident reporting.
  • NIST AI RMF — the Govern function addresses accountability structures explicitly.
  • FTC — signaled it will hold deploying organizations accountable for AI-caused consumer harm under existing unfair/deceptive practices authority.
Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
Audit LoggingImmutable agent action logsAzure Monitor Log AnalyticsAWS CloudTrail (integrity validation)Cloud Audit Logs (Data Access)OpenTelemetry + Loki
LLM ObservabilityTrace reasoning and tool callsAzure AI Foundry TracingAmazon Bedrock invocation loggingVertex AI Experiments + loggingLangfuse / LangSmith / Arize Phoenix
Accountability FrameworksDeveloper / Operator / User tiersMicrosoft Purview AI HubAWS AI Service CardsGoogle Model CardsMLflow + responsibility docs
Decision RecordsStructured AI decision documentationAzure ML Model CardsAmazon SageMaker Model CardsVertex AI Model Registry metadataMLflow / DVC
ExplainabilityAI decision explanationAzure Responsible AI dashboardAmazon SageMaker ClarifyVertex Explainable AISHAP / LIME / AI Fairness 360
Deep Dive · Q5
Sidechain Attacks on Agentic AI Systems
Every tool you give an agent is a potential attack vector. The documented attack classes represent the current research frontier and the gap between known attacks and deployed defenses is wide.
Prompt injection via retrieved content
  • Direct injection: malicious instructions in a webpage, PDF, or email the agent retrieves. Classic demo: white-on-white text saying “ignore previous instructions.”
  • Indirect injection via search: attacker poisons a publicly accessible page knowing it will appear in the agent's results for a predictable query.
  • Email-borne injection: demonstrated against early Copilot and AutoGPT systems — a malicious email instructs the inbox-processing agent to exfiltrate data.
Greshake et al. 2023 — the foundational paper
  • Passive injection — malicious content sitting in a data source waiting to be retrieved
  • Active injection — attacker anticipates retrieval pattern and plants content specifically for that agent
  • Persistent injection — content written into a memory store influencing future agent behavior across sessions
Memory poisoning / RAG store attacks

An attacker who can write to a shared knowledge base can plant content retrieved and treated as authoritative context. The injected content doesn't need to look like an instruction — framed as a “policy document” it gets incorporated naturally. Demonstrated against enterprise RAG systems with insufficiently permissioned write access.

Tool call hijacking

If the attacker controls or compromises a tool endpoint, they return a response containing injected instructions alongside legitimate data. Demonstrated against LangChain configurations where tool output wasn't sanitized before re-ingestion.

Multi-agent privilege escalation (confused deputy)

A low-privilege agent is compromised then issues instructions to a higher-privilege orchestrating agent — laundering the attacker's intent through a trusted internal channel. Demonstrated conceptually against AutoGPT and multi-agent LangChain pipelines; increasingly a concern in production Copilot Studio environments.

Crescendo and Skeleton Key
  • Crescendo (Microsoft AI Red Team): multi-turn manipulation where each step appears benign but cumulative effect is significant behavioral deviation. Hard to detect — no single turn looks anomalous.
  • Skeleton Key: attacks that extract the agent's system prompt, enabling targeted follow-on injection that navigates precisely around known constraints.
The through-line
Most of these attacks share one root cause — agents fail to maintain structural separation between data they read and instructions they follow. Until that is architecturally solved, injection-class attacks remain the dominant threat.
Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
Prompt Injection DefensePrompt shield and input scanningAzure AI Content Safety (Prompt Shields)Amazon Bedrock GuardrailsVertex AI Model ArmorLLM Guard / NeMo Guardrails / Rebuff
RAG / Memory SecurityVector store access controlAzure AI Search (RBAC + private endpoints)Amazon Kendra + OpenSearch SecurityVertex AI Vector Search IAMWeaviate (RBAC) / Qdrant
Tool Registry IntegritySigned and verified tool catalogAzure APIM (policy enforcement)AWS API Gateway + Lambda authorizersApigee + Cloud EndpointsOpen Policy Agent
Red TeamingAdversarial testingAzure PyRITAmazon Bedrock model evaluationGoogle DeepMind safety evalGarak / PromptBench
Multi-Agent TrustSigned inter-agent communicationMicrosoft Entra Agent IDAWS IAM role chaining with trust policiesWorkload Identity FederationSPIFFE / SPIRE
Output FilteringResponse scanningAzure AI Content SafetyAmazon Comprehend + Bedrock GuardrailsVertex AI Content SafetyPresidio / LLM Guard
Deep Dive · Emerging Topic
Supply Chain & Model Provenance
Where did the model come from, was it tampered with during fine-tuning or quantization, and how do you verify integrity before deployment? This is the AI equivalent of SolarWinds-style supply chain risk and it is largely unaddressed in current enterprise security frameworks.
The provenance problem

Unlike traditional software where a binary can be checksummed, AI model provenance is deeply opaque. A model's behavior is determined by its training data, training process, fine-tuning, RLHF feedback, and quantization — each step a potential point of tampering.

  • Training data poisoning: if an attacker influences even a small percentage of the corpus they can embed backdoor behaviors that activate under specific trigger inputs. Demonstrated in academic research against both classification models and LLMs.
  • Fine-tuning attacks: if the fine-tuning pipeline is compromised, or the base model already contains a backdoor, the resulting model may behave unpredictably in production.
  • Quantization risks: converting a model to INT8 or INT4 is not a neutral operation — it changes behavior in ways difficult to characterize and can expose latent vulnerabilities.
The open-weight model risk
  • Model weights downloaded from public repositories have no cryptographic integrity guarantees by default.
  • Malicious actors have already published poisoned model variants on public hubs that pass basic capability evals but contain embedded backdoors.
  • Unlike closed API models where the provider maintains integrity, with open-weight deployments the enterprise is the last line of defense.
Emerging defenses
  • AI Bill of Materials (AI-BOM): structured record of a model's lineage — base model, fine-tuning datasets, training infrastructure, known limitations. NIST is developing standards for this.
  • Cryptographic model signing: hash-based integrity verification for model weights, analogous to code signing for software.
  • Behavioral red-teaming at intake: before deploying any externally sourced model, run a structured battery of adversarial probes to detect backdoor triggers.
  • Isolated eval environments: never run an unverified model in a production-connected environment.
The dependency chain extends further

Beyond the model itself, the agent framework, orchestration library (LangChain, AutoGen, CrewAI), vector database, and embedding model are all part of the supply chain. A compromised dependency at any layer can undermine the entire agentic system and most organizations have no visibility into this dependency graph.

Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
Model Integrity / SigningCryptographic model verificationAzure ML (model signing via Key Vault)Amazon SageMaker + AWS SignerGoogle Artifact Registry (signed artifacts)Sigstore / Cosign / TUF
AI Bill of MaterialsModel lineage documentationMicrosoft Purview Data LineageAWS SageMaker Model CardsVertex AI Model RegistryCycloneDX AI BOM / SPDX
Dependency ScanningFramework and library vulnerability scanningMicrosoft Defender for DevOpsAmazon Inspector + CodeGuruGoogle Artifact AnalysisTrivy / Grype / Dependabot
Isolated Model EvaluationSandboxed intake testingAzure ML compute clusters (isolated)Amazon SageMaker Studio (isolated domain)Vertex AI Workbench (VPC-SC)Podman / gVisor sandboxing
Backdoor / Trojan DetectionModel behavioral scanningAzure Responsible AI dashboardAmazon SageMaker ClarifyVertex Explainable AITrojanZoo / IBM ART
Deep Dive · Emerging Topic
Agentic AI in Regulated Industries
Healthcare, finance, and critical infrastructure have compliance obligations that collide awkwardly with non-deterministic agents. The gap between what regulators expect and what agentic systems can currently guarantee requires new architectural patterns, not just policy.
The compliance collision
  • HIPAA — requires audit trails for every PHI access. An agent that incidentally reads protected health information while completing a task creates access events that may not be captured by traditional audit infrastructure.
  • SOX — requires controls over financial reporting. If an AI agent participates in financial data processing, the control framework must demonstrate bounded, tested, auditable behavior.
  • NERC CIP — critical infrastructure standards assume manual change management. An agent that autonomously modifies OT environment configurations is a risk category NERC CIP was not designed to address.
  • GDPR / CCPA — the right to explanation for automated decisions is structurally difficult when the explanation involves a transformer attention mechanism.
Healthcare-specific risks
  • Clinical decision support agents influencing care pathways create liability exposure when acting on stale, hallucinated, or out-of-distribution data.
  • PHI exfiltration risk is amplified when agents have broad EHR access — a single prompt injection could cause an agent to exfiltrate patient records.
  • FDA is developing guidance for AI/ML-based Software as a Medical Device (SaMD) — agentic systems in clinical settings will likely require pre-market review pathways.
Financial services-specific risks
  • Agents with trading or transaction authority create market manipulation risk through adversarial manipulation or emergent behavior at scale.
  • Model risk management (SR 11-7 guidance) requires validation, ongoing monitoring, and independent review — standards that do not map cleanly onto foundation model deployments.
  • AML and KYC processes incorporating AI agents must demonstrate to regulators that decisions are explainable and human oversight is genuine, not performative.
Architectural responses
  • Tiered data access: agents should never have standing access to the full regulated data environment. Access granted per-task, per-data-class, with explicit justification.
  • Deterministic audit wrappers: even if the agent's internal reasoning is non-deterministic, every externally visible action can be logged in a tamper-evident audit record.
  • Human-in-the-loop at regulatory boundaries: any agent action crossing a regulatory threshold should require explicit human approval, not just logging.
Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
Healthcare (HIPAA)HIPAA-compliant AI infrastructureAzure Health Data Services (HIPAA BAA)AWS HIPAA-eligible servicesGoogle Cloud Healthcare APIOpenMRS / HAPI FHIR (on-prem)
Financial (SR 11-7)Model risk management validationAzure ML (model validation workflows)Amazon SageMaker Model MonitorVertex AI Model EvaluationMLflow + Great Expectations
Immutable Audit TrailTamper-evident logging for regulatorsAzure Immutable Blob Storage + MonitorAWS CloudTrail + S3 Object LockCloud Audit Logs + GCS WORMHyperledger Fabric (audit chain)
Data Residency and SovereigntyGeographic data controlsAzure Sovereign Clouds (Gov, China)AWS GovCloud + data residency controlsGoogle Assured WorkloadsMinIO (on-prem) / Nextcloud
Compliance ReportingAutomated evidence collectionMicrosoft Purview Compliance ManagerAWS Audit ManagerGoogle Security Command CenterOpenSCAP / Prowler / Steampipe
Critical Infrastructure (NERC CIP)OT/ICS AI access controlsMicrosoft Defender for IoTAWS IoT Greengrass + GuardDutyGoogle Distributed Cloud EdgeClaroty / Dragos community resources
Deep Dive · Emerging Topic
Data Exfiltration as a First-Class Threat
Agents with broad data access create a fundamentally different exfiltration risk profile than traditional insider threat models. The data gravity problem: agents naturally accumulate context that may include sensitive data far beyond their task scope.
Why agents change the exfiltration calculus
  • Traditional exfiltration requires a human actor to deliberately seek out and move data. An agent can be instructed to exfiltrate through a prompt injection with no human intent involved.
  • Agents accumulate context across tool calls — an agent tasked with summarizing emails may incidentally read sensitive financial or strategic content that was never the intended target.
  • The blast radius of a single compromised agent session can be enormous: an enterprise agent with broad CRM, email, and file system access touches more sensitive data in one task than a typical employee accesses in a week.
Exfiltration channels unique to agents
  • Output exfiltration: an injected instruction causes the agent to include sensitive data in its visible output, which may then be sent to an attacker-controlled destination.
  • Covert channel via tool calls: an agent makes API calls to an external service using sensitive data as parameters — disguising exfiltration as legitimate tool use.
  • Steganographic exfiltration: demonstrated in research — sensitive data encoded into seemingly innocuous agent output that can be decoded by an attacker-controlled receiver.
  • Memory store exfiltration: an agent writes sensitive data to a shared memory store the attacker has read access to, without triggering traditional DLP controls.
Defenses
  • Output inspection: apply DLP-style pattern matching to agent outputs before they are returned to users or written to external systems.
  • Tool call parameter auditing: log and inspect the parameters of every outbound tool call, not just the fact that a call was made.
  • Context scoping: limit what data enters the agent's context window. Retrieval systems should enforce need-to-know at the retrieval layer.
  • Egress filtering: agent infrastructure should have explicit egress allowlists — outbound calls only to approved endpoints with anomaly detection for new destinations.
Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
DLP for Agent OutputsOutput content inspectionMicrosoft Purview DLP + AI Content SafetyAmazon Macie + Bedrock GuardrailsGoogle DLP API + Vertex Content SafetyPresidio (Microsoft OSS) / LLM Guard
Egress FilteringOutbound network allowlistingAzure Firewall + Private EndpointsAWS Network Firewall + VPC EndpointsGoogle VPC Service Controls + Cloud ArmorCilium / OPA network policy
Tool Call AuditingLog all outbound API parametersAzure APIM + Monitor (request logging)AWS API Gateway Access Logging + CloudTrailApigee Analytics + Cloud Audit LogsOpenTelemetry / Tyk
Context Scoping / Need-to-KnowRetrieval-layer access controlAzure AI Search (security trimming)Amazon Kendra (ACL-based retrieval)Vertex AI Search (IAM-based)Weaviate RBAC / Qdrant filtering
Secrets DetectionScan outputs for leaked credentialsMicrosoft Defender for Cloud (secret scanning)Amazon CodeGuru Reviewer + MacieGoogle Cloud Secret Manager + DLPTruffleHog / Gitleaks / Detect-secrets
Deep Dive · Emerging Topic
Trust Hierarchies in Multi-Agent Systems
When agents spawn other agents, or when a human delegates to an agent that delegates further, trust propagation becomes a first-class security problem requiring runtime verification, not just design-time policy.
The delegation semantics problem

In multi-agent systems, delegation semantics are poorly defined:

  • Does a sub-agent inherit the full permissions of the orchestrating agent, a subset, or only task-specific permissions?
  • Can a sub-agent further delegate? To whom? With what constraints?
  • If a sub-agent's action is harmful, is the orchestrating agent that spawned it accountable?

None of these questions have standardized answers in current agentic frameworks.

The attenuation principle

Trust should attenuate, not amplify, at every delegation hop.

  • An orchestrating agent should only be able to grant a sub-agent a subset of its own permissions — never more.
  • Each delegation hop should carry a signed context header proving the chain of authority: who originated the task, what permissions were granted at each step, what constraints apply.
  • Sub-agents should verify the legitimacy of instructions they receive — not just trust that because a message arrived in the right channel it must be legitimate.
Runtime trust verification
  • Instruction provenance checking: before executing an instruction from another agent, verify that the instruction source has the authority to issue it.
  • Capability tokens: rather than inheriting ambient permissions, sub-agents receive explicit capability tokens for specific actions they are authorized to take on the current task.
  • Anomaly detection on agent-to-agent traffic: instructions that deviate from established patterns or request permissions outside the task envelope should trigger alerts.
The who-told-you-to-do-this problem

When an incident occurs in a multi-agent system, reconstructing the instruction chain is essential for accountability. This requires every agent-to-agent instruction to be logged with full provenance — source identity, timestamp, task context, and permissions granted. Without this, incident investigation becomes intractable.

Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
Agent Identity and AttestationWorkload identity per agentMicrosoft Entra Agent IDAWS IAM Roles + OIDC (per agent)Workload Identity FederationSPIFFE / SPIRE
Capability Tokens / Scoped AuthTask-scoped token issuanceMicrosoft Entra (scoped OAuth tokens)AWS STS AssumeRole (scoped)Google IAM service account impersonationHashiCorp Vault (AppRole / entity tokens)
Agent-to-Agent CommunicationSigned inter-agent messagesAzure Service Bus + Entra managed identityAmazon SQS + IAM signed requestsGoogle Pub/Sub + Workload IdentitySPIFFE SVID + mTLS / NATS with JWTs
Delegation Policy EnforcementAttenuation at each hopAzure APIM policies + Entra token claimsAWS IAM permission boundariesGoogle IAM conditions + org policyOpen Policy Agent (delegation rules)
Multi-Agent OrchestrationAgent workflow with trust controlsAzure AI Foundry Agent ServiceAmazon Bedrock Multi-Agent CollaborationVertex AI Agent BuilderLangGraph / AutoGen / CrewAI
Deep Dive · Emerging Topic
Secure Prompt & System Design
Prompt engineering is a security discipline. How you structure a system prompt materially affects an agent's attack surface — and vague or overly permissive system prompts are a security liability practitioners can address immediately.
The instruction hierarchy

Most production LLM systems have an implicit instruction hierarchy: system prompt > user message > tool output. Security posture depends on how strictly this hierarchy is enforced:

  • A model that allows user messages to override system prompt constraints is fundamentally insecure for agentic deployments.
  • A model that treats tool output as having the same trust level as system prompt instructions is vulnerable to tool-based injection.
  • Explicit, well-defined hierarchy — and a model that reliably respects it — is a prerequisite for production-grade agent security.
System prompt hardening principles
  • Explicit scope definition: state clearly what the agent is and is not permitted to do. Vague permissions are exploitable; specific permissions are defensible.
  • Explicit denial of override: include instructions that the system prompt cannot be overridden by user messages or externally retrieved content. This raises the bar for injection attacks.
  • Data-instruction separation: use structural markers (XML tags, delimiters) to distinguish between content to process as data versus instructions to follow.
  • Minimal capability declaration: only describe capabilities the agent actually needs. Enumerating broad capabilities the agent could theoretically use invites an attacker to invoke them.
Prompt as an audit artifact

System prompts should be treated as security-critical configuration artifacts — versioned, reviewed, and subject to change management. A prompt change that expands agent permissions should require the same scrutiny as a firewall rule change. In practice, most organizations have no formal process for prompt review or approval.

The meta-prompt attack surface

In multi-agent systems, orchestrating agents often generate system prompts for sub-agents dynamically. If an attacker can influence what the orchestrating agent writes into a sub-agent's system prompt, they can shape the sub-agent's security posture. Dynamic prompt generation should be treated with the same rigor as any code that touches security boundaries.

Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
Prompt Management and VersioningSystem prompt version controlAzure AI Foundry (prompt management)Amazon Bedrock Prompt ManagementVertex AI Prompt OptimizerLangfuse / PromptFlow / Promptly
Prompt Injection HardeningInput sanitization and content safetyAzure AI Content Safety (Prompt Shields)Amazon Bedrock GuardrailsVertex AI Model ArmorLLM Guard / NeMo Guardrails
Instruction Hierarchy EnforcementSystem > user > tool trust levelsAzure OpenAI system prompt controlsAmazon Bedrock system prompt configGemini system instruction APILangChain (message type hierarchy)
Meta-Prompt SecuritySecure dynamic prompt generationAzure APIM (prompt templating + validation)AWS Step Functions (prompt pipeline)Vertex AI PipelinesGuidance (Microsoft OSS) / LMQL
Prompt Audit LoggingLog all prompts as config artifactsAzure Monitor + AI Foundry loggingAmazon Bedrock invocation loggingVertex AI loggingLangfuse / MLflow prompt tracking
Deep Dive · Emerging Topic
Observability & Detection Engineering for AI
The SOC does not know how to write detection rules for agent behavior yet. What does anomaly detection look like when the normal behavior of an agent is inherently variable? This bridges AI security into the existing SecOps world in a concrete way.
Why traditional detection breaks
  • Traditional SIEM rules are pattern-based — they look for specific sequences of events. Agent behavior produces variable event sequences for the same logical task, making static rules ineffective.
  • The signal-to-noise problem is acute: agents generate high volumes of tool calls, API requests, and data accesses that are indistinguishable from legitimate behavior without semantic context.
  • Traditional UEBA assumes a stable behavioral baseline. Agents do not have one — their behavior varies with every task and context window.
Building a behavioral baseline for agents
  • Task-conditioned baselines: maintain baselines per task type, not per agent. An email summarization agent has a different expected behavior profile than a data analysis agent, even if they are the same underlying model.
  • Trajectory analysis: analyze the sequence and composition of tool calls across a task. A task that starts with reading emails and ends with making outbound API calls to an unknown endpoint is anomalous regardless of whether each individual action looks benign.
  • Privilege usage analytics: track which permissions each agent actually uses versus which it holds. Sudden use of previously unused permissions warrants investigation.
What to instrument
  • Every tool call: name, parameters, response, latency, success/failure
  • Data access events: what was read, at what sensitivity level, in what task context
  • Output events: what was written, to where, of what size and type
  • Agent spawning events: what sub-agents were created, with what permissions
  • External network calls: destination, protocol, payload size, frequency
Detection patterns that work now
  • Exfiltration volume anomalies: agent reads significantly more data than it writes output — a signal that data may be accumulating in context for extraction.
  • Permission boundary probing: multiple failed permission checks in rapid succession suggests something is exploring the permission landscape.
  • Instruction-action mismatch: agent's actions are inconsistent with its declared task — increasingly tractable with LLM-based detection.
  • Novel endpoint calls: agent makes calls to endpoints not seen in its task history — high-fidelity signal when combined with egress allowlisting.
References & Further Reading
Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
LLM / Agent TracingEnd-to-end agent call tracingAzure AI Foundry Tracing + App InsightsAmazon Bedrock CloudWatch + X-RayVertex AI Experiments + Cloud TraceLangSmith / Langfuse / Arize Phoenix
Behavioral Anomaly DetectionTask-conditioned behavior monitoringMicrosoft Sentinel UEBA + AI analyticsAmazon GuardDuty ML + DetectiveGoogle Security Operations (UEBA)Elastic SIEM / Falco / Prometheus
SIEM IntegrationAgent events to SIEM pipelineMicrosoft Sentinel (native connectors)Amazon Security Lake + OpenSearchGoogle Chronicle SIEMOpenSearch Security Analytics / Wazuh
Metrics and DashboardsAgent performance and security dashboardsAzure Monitor Dashboards + WorkbooksAmazon CloudWatch DashboardsGoogle Cloud Monitoring + LookerGrafana + Prometheus
Privilege Usage AnalyticsTrack permissions used vs grantedMicrosoft Entra Access ReviewsAWS IAM Access AnalyzerGoogle IAM RecommenderCloudsplaining / Cartography
Deep Dive · Emerging Topic
Security Testing Methodology for Agentic AI
Red-teaming and pentesting as applied to agentic systems is a nascent field. MITRE ATLAS and OWASP LLM Top 10 give frameworks but not methodology. The gap between having a framework and knowing how to test against it is significant.
Why traditional pentest methodology falls short
  • Traditional pentests target deterministic systems. Agentic AI requires probabilistic testing — an attack that fails 9 times out of 10 is still viable if the consequences of the 10th success are severe enough.
  • The attack surface is dynamic: an agent's effective attack surface changes with every task, context window, and set of tools it is given access to.
  • Traditional test scoping does not translate cleanly when the application can spawn sub-agents, call arbitrary APIs, and read from data sources not known at test design time.
Building an AI threat model

An AI-specific threat model should enumerate:

  • Trust boundaries: where does the agent receive inputs from untrusted sources? What are all the data channels into the agent's context window?
  • Action inventory: what is every action the agent can take? What are the worst-case consequences of each?
  • Delegation graph: what agents can this agent spawn or instruct? What agents can instruct this agent?
  • Data sensitivity map: what is the most sensitive data the agent can access, and what are the paths by which it could be extracted?
Testing approaches that work
  • Injection corpus testing: build a library of injection payloads and test them systematically against every data input channel.
  • Permission boundary testing: attempt to get the agent to take actions outside its defined scope through instruction, social engineering, and multi-step manipulation.
  • Multi-turn adversarial scenarios: test for Crescendo-style attacks — sequences of individually benign interactions that cumulatively steer toward harmful behavior.
  • Sub-agent injection: in multi-agent systems, test whether a compromised sub-agent can influence the behavior of the orchestrating agent.
  • Exfiltration path testing: attempt to cause the agent to include sensitive data in outputs, tool call parameters, or written artifacts using all available injection vectors.
Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
AI Red TeamingAutomated adversarial testingAzure PyRITAmazon Bedrock model evaluationGoogle DeepMind safety evaluationGarak / PromptBench / PyRIT (OSS)
Threat ModelingAI-specific threat modelingMicrosoft Threat Modeling ToolAWS Threat ComposerGoogle Cloud Security threat modelingMITRE ATLAS / OWASP Threat Dragon
Injection Corpus TestingSystematic prompt injection testingAzure AI Content Safety evaluationAmazon Bedrock prompt evaluationVertex AI evaluation SDKGarak / PromptInject / HarmBench
Multi-Turn Adversarial TestingCrescendo-style scenario testingAzure PyRIT (multi-turn orchestration)Amazon Bedrock automated red teamingGoogle AI Safety red team toolsGarak (multi-turn probes) / CyberSecEval
Compliance Benchmark TestingSafety benchmark evaluationAzure AI Foundry evaluationsAmazon Bedrock model evaluation jobsVertex AI model eval (safety metrics)EleutherAI LM Eval Harness / HELM
Deep Dive · Emerging Topic
Insider Threat in the Age of AI Agents
Agents acting on behalf of users dramatically amplify insider threat scenarios. A malicious insider no longer needs direct system access — they can craft inputs to an agent that cause it to act on their behalf while obscuring their intent.
The amplification problem
  • A traditional insider threat is bounded by the individual's own access rights and the manual effort required to exfiltrate data. An agent multiplies both: the agent may have broader access than the user, and the user can instruct the agent to perform in minutes what would take hours manually.
  • The agent acts as a force multiplier and an abstraction layer simultaneously — the insider's actions are mediated by the agent, making attribution harder.
  • Plausible deniability: “I didn't do that, the agent did” becomes a defense that may be difficult to refute without detailed instruction logs.
New insider threat patterns
  • Instruction laundering: a malicious insider crafts prompts that instruct the agent to perform actions the insider could not do directly — bypassing access controls that do not apply to the agent.
  • Scope creep exploitation: agents operating with broad permissions can be instructed to access data far outside the nominal task scope.
  • Timing attacks: instructing an agent to perform sensitive actions at times when oversight is reduced — after hours, during high-volume periods when alerts are more likely to be buried.
  • Credential harvesting via agent: instructing an agent to access systems and capture credentials or session tokens in its output, which the insider then uses for direct access.
Detection and mitigation
  • Instruction auditing: the human's instructions to the agent must be logged with the same rigor as the agent's actions. Without this, you can see what the agent did but not why.
  • Behavioral correlation: correlate agent behavior with the instructing user's historical patterns. An agent suddenly accessing data categories the user has never previously touched is a signal worth investigating.
  • Dual-approval for high-sensitivity tasks: tasks involving highly sensitive data should require a second human to authorize the agent's instructions, not just the agent's actions.
Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
Instruction AuditingLog human to agent instructionsAzure Monitor + AI Foundry loggingAmazon Bedrock invocation logs + CloudTrailVertex AI logging + Cloud Audit LogsLangfuse / OpenTelemetry
User Behavior AnalyticsCorrelate user and agent behaviorMicrosoft Sentinel UEBAAmazon GuardDuty + DetectiveGoogle Security Operations (UEBA)Elastic SIEM / Wazuh
Privileged Access ManagementControl what users can instruct agents to doMicrosoft Entra PIMAWS IAM + Access AnalyzerGoogle BeyondCorp PAMCyberArk Conjur (OSS) / HashiCorp Vault
Data Loss PreventionDetect sensitive data in agent instructionsMicrosoft Purview DLPAmazon Macie + ComprehendGoogle DLP APIPresidio / OpenDLP
Dual Approval WorkflowsMulti-person auth for sensitive agent tasksMicrosoft Entra Verified ID + ApprovalsAWS IAM MFA conditions + Access ApprovalGoogle Access ApprovalOPA (multi-approver policy) / Teleport
Deep Dive · Emerging Topic
Cross-Organizational Agent Interaction
Enterprises are beginning to expose agent-to-agent APIs — one company's agent calling another's. This creates inter-organizational trust problems that no existing security framework handles well.
The inter-org trust gap

When two organizations' agents interact, they bring different:

  • Security policies and guardrails that may be incompatible or in conflict
  • Data classification schemes — what one organization considers public another may consider confidential
  • Permission models — an agent authorized by Organization A may be performing actions on systems owned by Organization B
  • Incident response capabilities — if something goes wrong, which organization's IR team responds, and do they have visibility into the other's agent behavior?
Liability and contractual gaps
  • Existing SLAs and data processing agreements were not written with autonomous agent interaction in mind. An agent that calls a partner's API and causes an incident falls outside most contractual frameworks for fault allocation.
  • The question of who is responsible when Agent A (Org A) instructs Agent B (Org B) to take an action that causes harm is genuinely unresolved in current legal frameworks.
  • Data residency and sovereignty requirements may be violated when agents route data through cross-organizational interactions not anticipated in the original compliance architecture.
Emerging patterns
  • Agent API contracts: formal specifications of what actions a partner's agent is permitted to request, what data it will receive, and what audit information will be shared in case of incidents.
  • Mutual attestation: before an agent-to-agent interaction, both agents exchange signed attestations of their current permission scope, security posture, and data handling commitments.
  • Federated audit logs: cross-organizational agent interactions should produce audit records that both organizations can access, enabling joint incident investigation.
  • Sandboxed interaction zones: cross-org agent interactions should occur in isolated environments that prevent the partner agent from accessing internal systems not explicitly shared.
Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
Cross-Org Agent AuthenticationInter-organization workload identityMicrosoft Entra B2B + Workload IdentityAWS IAM cross-account + OIDCWorkload Identity Federation (cross-org)SPIFFE Federation / SPIRE
API Security and ContractsAgent API access controlAzure APIM (cross-tenant policies)AWS API Gateway + resource-based policiesApigee (cross-org API management)Kong Gateway / Tyk
Mutual AttestationRuntime security posture exchangeMicrosoft Entra Verified IDAWS Artifact + OIDC attestationGoogle Assured Workloads attestationSPIFFE SVID + mTLS
Federated Audit LogsShared cross-org audit trailAzure Monitor (cross-tenant queries)AWS CloudTrail + S3 cross-accountCloud Audit Logs + Log Sink (cross-project)OpenTelemetry (federated collector)
Data Classification and SharingControlled data exchangeMicrosoft Purview (cross-tenant sharing)AWS Lake Formation cross-accountGoogle Analytics HubApache Atlas / OpenMetadata
Deep Dive · Emerging Topic
The Ghost Agent Problem
Agents that were deployed, forgotten, and are still running with stale credentials and outdated models. The lifecycle management problem is already a real operational issue at early-adopter organizations and will become a major audit finding category.
How ghost agents emerge
  • An agent is deployed for a specific project. The project ends but the agent — along with its credentials, permissions, and integrations — is never formally decommissioned.
  • Team turnover: the person who deployed the agent leaves, and no one else has visibility into its existence or operation.
  • Shadow AI deployment: an individual team deploys an agent without formal IT or security involvement. When that individual leaves, the agent becomes invisible to governance processes.
  • Automated agent spawning: in complex multi-agent systems, sub-agents may be created dynamically and persist beyond the lifecycle of the task that created them.
Why ghost agents are a serious risk
  • Stale credentials: ghost agents often hold long-lived credentials that were never rotated. If those credentials are compromised, the attacker gains access with no one actively monitoring the agent's behavior.
  • Outdated models: a ghost agent running an older model version may lack safety mitigations introduced in subsequent versions, making it more vulnerable to known attack patterns.
  • Unpatched dependencies: the orchestration libraries, tool integrations, and supporting infrastructure of a ghost agent accumulate security debt with no one maintaining them.
  • Invisible blast radius: if a ghost agent is compromised, the organization has no incident response playbook for an agent it did not know existed.
Lifecycle governance
  • Agent registry as a control: every deployed agent must be registered with owner, creation date, purpose, permissions, model version, and last-reviewed date. Unregistered agents are treated as unauthorized.
  • Mandatory expiration: agent deployments should have explicit expiration dates requiring active renewal. Default-to-expired is safer than default-to-permanent.
  • Credential TTL enforcement: agent credentials should have short TTLs enforced at the infrastructure level. An agent that cannot refresh its credentials automatically decommissions itself.
  • Periodic access reviews: include agent access in the same periodic access review processes applied to human users.
Solutions & Platforms
Area / TopicNameAzureAWSGoogleOpen Source
Agent Inventory / RegistryCatalog all deployed agentsAzure AI Foundry (model/agent catalog)Amazon SageMaker Model RegistryVertex AI Model RegistryMLflow Registry / Backstage
Credential TTL EnforcementShort-lived auto-expiring credentialsMicrosoft Entra (token lifetime policies)AWS STS session duration limitsGoogle IAM (short-lived service account keys)HashiCorp Vault (TTL leases)
Access ReviewsPeriodic agent permission reviewsMicrosoft Entra Access ReviewsAWS IAM Access Analyzer + Access AdvisorGoogle IAM RecommenderCartography / Cloudsplaining
Shadow AI DiscoveryDetect unauthorized agent deploymentsMicrosoft Defender for Cloud AppsAmazon Macie + ConfigGoogle Security Command CenterProwler / Steampipe
Lifecycle AutomationAuto-decommission on expiryAzure Policy (auto-remediation)AWS Config Rules + Lambda remediationGoogle Cloud Asset Inventory + PolicyAnsible / Terraform lifecycle rules