Securing Agentic AI — Panel Prep

Deep Dive · Q3

Accountability When an AI Agent Does Something Wrong

Accountability has to be assigned, not discovered. Traditional chains work because there is a person at the end who made a choice. With agentic AI that chain gets murky, so the framework has to change.

The three-layer accountability model

Developer accountability — responsible for the model's base behavior, safety training, and known failure modes. If the model is fundamentally unsafe this layer owns it.
Operator accountability — the organization that deployed the agent, configured permissions, set task scope, and chose when to go live. Most enterprise incidents land here.
User accountability — the person who initiated the task. Diminished if the system's behavior was opaque or the user had no reason to anticipate the failure.

The non-determinism problem

Causal accountability — harder with non-deterministic models but not impossible. You can audit inputs, context, tool calls, and outputs even without replaying internal reasoning.
Structural accountability — entirely tractable. Did the operator test edge cases? Were guardrails in place? These are auditable facts.

Analogy worth using on panel

We don't require a pilot to explain every micro-adjustment before determining negligence. We look at the flight data recorder, procedures followed, and whether the system was airworthy. Same logic applies to AI agents.

The meaningful human control test

Was there meaningful human control at the point things went wrong? Not “was a human in the loop?” but did the human have: sufficient information to understand what the agent was about to do, the ability to intervene before irreversible action, and a reasonable expectation the action was in scope? If any of those three are missing, accountability shifts to the operator and developer.

Anticipate this counterpoint

Someone will argue holding operators accountable chills AI adoption. The rebuttal: no clear accountability creates more risk aversion because organizations cannot price the liability. Clear frameworks enable deployment by making risk calculable — the same argument that made product liability law net-positive for innovation.

Regulatory horizon

EU AI Act — assigns obligations to deployers of high-risk systems: documentation, human oversight, incident reporting.
NIST AI RMF — the Govern function addresses accountability structures explicitly.
FTC — signaled it will hold deploying organizations accountable for AI-caused consumer harm under existing unfair/deceptive practices authority.

Key frameworks

3-tier accountability ↗Meaningful human control ↗EU AI Act ↗NIST AI RMF ↗Causal vs. structural accountability ↗

References & Further Reading

[1]NIST AI Risk Management Frameworknist.gov
[2]EU Artificial Intelligence Act (2024/1689)eur-lex.europa.eu
[3]Meaningful Human Control — Article 36 Technical Reportarticle36.org
[4]Anthropic Usage Policy — Operator and User Frameworkanthropic.com

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
Audit Logging	Immutable agent action logs	Azure Monitor Log Analytics	AWS CloudTrail (integrity validation)	Cloud Audit Logs (Data Access)	OpenTelemetry + Loki
LLM Observability	Trace reasoning and tool calls	Azure AI Foundry Tracing	Amazon Bedrock invocation logging	Vertex AI Experiments + logging	Langfuse / LangSmith / Arize Phoenix
Accountability Frameworks	Developer / Operator / User tiers	Microsoft Purview AI Hub	AWS AI Service Cards	Google Model Cards	MLflow + responsibility docs
Decision Records	Structured AI decision documentation	Azure ML Model Cards	Amazon SageMaker Model Cards	Vertex AI Model Registry metadata	MLflow / DVC
Explainability	AI decision explanation	Azure Responsible AI dashboard	Amazon SageMaker Clarify	Vertex Explainable AI	SHAP / LIME / AI Fairness 360

Deep Dive · Q5

Sidechain Attacks on Agentic AI Systems

Every tool you give an agent is a potential attack vector. The documented attack classes represent the current research frontier and the gap between known attacks and deployed defenses is wide.

Prompt injection via retrieved content

Direct injection: malicious instructions in a webpage, PDF, or email the agent retrieves. Classic demo: white-on-white text saying “ignore previous instructions.”
Indirect injection via search: attacker poisons a publicly accessible page knowing it will appear in the agent's results for a predictable query.
Email-borne injection: demonstrated against early Copilot and AutoGPT systems — a malicious email instructs the inbox-processing agent to exfiltrate data.

Greshake et al. 2023 — the foundational paper

Passive injection — malicious content sitting in a data source waiting to be retrieved
Active injection — attacker anticipates retrieval pattern and plants content specifically for that agent
Persistent injection — content written into a memory store influencing future agent behavior across sessions

Memory poisoning / RAG store attacks

An attacker who can write to a shared knowledge base can plant content retrieved and treated as authoritative context. The injected content doesn't need to look like an instruction — framed as a “policy document” it gets incorporated naturally. Demonstrated against enterprise RAG systems with insufficiently permissioned write access.

Tool call hijacking

If the attacker controls or compromises a tool endpoint, they return a response containing injected instructions alongside legitimate data. Demonstrated against LangChain configurations where tool output wasn't sanitized before re-ingestion.

Multi-agent privilege escalation (confused deputy)

A low-privilege agent is compromised then issues instructions to a higher-privilege orchestrating agent — laundering the attacker's intent through a trusted internal channel. Demonstrated conceptually against AutoGPT and multi-agent LangChain pipelines; increasingly a concern in production Copilot Studio environments.

Crescendo and Skeleton Key

Crescendo (Microsoft AI Red Team): multi-turn manipulation where each step appears benign but cumulative effect is significant behavioral deviation. Hard to detect — no single turn looks anomalous.
Skeleton Key: attacks that extract the agent's system prompt, enabling targeted follow-on injection that navigates precisely around known constraints.

The through-line

Most of these attacks share one root cause — agents fail to maintain structural separation between data they read and instructions they follow. Until that is architecturally solved, injection-class attacks remain the dominant threat.

Key frameworks

Greshake et al. 2023 ↗OWASP LLM Top 10 ↗MITRE ATLAS ↗Confused deputy ↗Crescendo ↗Skeleton Key ↗Content isolation ↗

References & Further Reading

[1]OWASP Top 10 for LLM Applicationsowasp.org
[2]MITRE ATLAS: Adversarial Threat Landscape for AI Systemsatlas.mitre.org
[3]Not What You’ve Signed Up For: Indirect Prompt Injections (arXiv:2302.12173)arxiv.org
[4]The Crescendo Multi-Turn Attack — Microsoft AI Red Team (arXiv:2404.01833)arxiv.org
[5]Skeleton Key Jailbreak Technique — Microsoft Security Blogmicrosoft.com
[6]The Confused Deputy Problem — Hardy (1988)en.wikipedia.org

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
Prompt Injection Defense	Prompt shield and input scanning	Azure AI Content Safety (Prompt Shields)	Amazon Bedrock Guardrails	Vertex AI Model Armor	LLM Guard / NeMo Guardrails / Rebuff
RAG / Memory Security	Vector store access control	Azure AI Search (RBAC + private endpoints)	Amazon Kendra + OpenSearch Security	Vertex AI Vector Search IAM	Weaviate (RBAC) / Qdrant
Tool Registry Integrity	Signed and verified tool catalog	Azure APIM (policy enforcement)	AWS API Gateway + Lambda authorizers	Apigee + Cloud Endpoints	Open Policy Agent
Red Teaming	Adversarial testing	Azure PyRIT	Amazon Bedrock model evaluation	Google DeepMind safety eval	Garak / PromptBench
Multi-Agent Trust	Signed inter-agent communication	Microsoft Entra Agent ID	AWS IAM role chaining with trust policies	Workload Identity Federation	SPIFFE / SPIRE
Output Filtering	Response scanning	Azure AI Content Safety	Amazon Comprehend + Bedrock Guardrails	Vertex AI Content Safety	Presidio / LLM Guard

Deep Dive · Emerging Topic

Supply Chain & Model Provenance

Where did the model come from, was it tampered with during fine-tuning or quantization, and how do you verify integrity before deployment? This is the AI equivalent of SolarWinds-style supply chain risk and it is largely unaddressed in current enterprise security frameworks.

The provenance problem

Unlike traditional software where a binary can be checksummed, AI model provenance is deeply opaque. A model's behavior is determined by its training data, training process, fine-tuning, RLHF feedback, and quantization — each step a potential point of tampering.

Training data poisoning: if an attacker influences even a small percentage of the corpus they can embed backdoor behaviors that activate under specific trigger inputs. Demonstrated in academic research against both classification models and LLMs.
Fine-tuning attacks: if the fine-tuning pipeline is compromised, or the base model already contains a backdoor, the resulting model may behave unpredictably in production.
Quantization risks: converting a model to INT8 or INT4 is not a neutral operation — it changes behavior in ways difficult to characterize and can expose latent vulnerabilities.

The open-weight model risk

Model weights downloaded from public repositories have no cryptographic integrity guarantees by default.
Malicious actors have already published poisoned model variants on public hubs that pass basic capability evals but contain embedded backdoors.
Unlike closed API models where the provider maintains integrity, with open-weight deployments the enterprise is the last line of defense.

Emerging defenses

AI Bill of Materials (AI-BOM): structured record of a model's lineage — base model, fine-tuning datasets, training infrastructure, known limitations. NIST is developing standards for this.
Cryptographic model signing: hash-based integrity verification for model weights, analogous to code signing for software.
Behavioral red-teaming at intake: before deploying any externally sourced model, run a structured battery of adversarial probes to detect backdoor triggers.
Isolated eval environments: never run an unverified model in a production-connected environment.

The dependency chain extends further

Beyond the model itself, the agent framework, orchestration library (LangChain, AutoGen, CrewAI), vector database, and embedding model are all part of the supply chain. A compromised dependency at any layer can undermine the entire agentic system and most organizations have no visibility into this dependency graph.

Key frameworks

AI-BOM ↗Model signing ↗Training data poisoning ↗Backdoor detection ↗NIST AI RMF ↗Intake red-teaming ↗

References & Further Reading

[1]NIST AI Risk Management Frameworknist.gov
[2]CISA Software Bill of Materials (SBOM) Resourcescisa.gov
[3]Backdoor Attacks and Defenses in ML — Goldblum et al. (arXiv:2012.10544)arxiv.org
[4]The Update Framework (TUF) — Secure Software Updatestheupdateframework.io

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
Model Integrity / Signing	Cryptographic model verification	Azure ML (model signing via Key Vault)	Amazon SageMaker + AWS Signer	Google Artifact Registry (signed artifacts)	Sigstore / Cosign / TUF
AI Bill of Materials	Model lineage documentation	Microsoft Purview Data Lineage	AWS SageMaker Model Cards	Vertex AI Model Registry	CycloneDX AI BOM / SPDX
Dependency Scanning	Framework and library vulnerability scanning	Microsoft Defender for DevOps	Amazon Inspector + CodeGuru	Google Artifact Analysis	Trivy / Grype / Dependabot
Isolated Model Evaluation	Sandboxed intake testing	Azure ML compute clusters (isolated)	Amazon SageMaker Studio (isolated domain)	Vertex AI Workbench (VPC-SC)	Podman / gVisor sandboxing
Backdoor / Trojan Detection	Model behavioral scanning	Azure Responsible AI dashboard	Amazon SageMaker Clarify	Vertex Explainable AI	TrojanZoo / IBM ART

Deep Dive · Emerging Topic

Agentic AI in Regulated Industries

Healthcare, finance, and critical infrastructure have compliance obligations that collide awkwardly with non-deterministic agents. The gap between what regulators expect and what agentic systems can currently guarantee requires new architectural patterns, not just policy.

The compliance collision

HIPAA — requires audit trails for every PHI access. An agent that incidentally reads protected health information while completing a task creates access events that may not be captured by traditional audit infrastructure.
SOX — requires controls over financial reporting. If an AI agent participates in financial data processing, the control framework must demonstrate bounded, tested, auditable behavior.
NERC CIP — critical infrastructure standards assume manual change management. An agent that autonomously modifies OT environment configurations is a risk category NERC CIP was not designed to address.
GDPR / CCPA — the right to explanation for automated decisions is structurally difficult when the explanation involves a transformer attention mechanism.

Healthcare-specific risks

Clinical decision support agents influencing care pathways create liability exposure when acting on stale, hallucinated, or out-of-distribution data.
PHI exfiltration risk is amplified when agents have broad EHR access — a single prompt injection could cause an agent to exfiltrate patient records.
FDA is developing guidance for AI/ML-based Software as a Medical Device (SaMD) — agentic systems in clinical settings will likely require pre-market review pathways.

Financial services-specific risks

Agents with trading or transaction authority create market manipulation risk through adversarial manipulation or emergent behavior at scale.
Model risk management (SR 11-7 guidance) requires validation, ongoing monitoring, and independent review — standards that do not map cleanly onto foundation model deployments.
AML and KYC processes incorporating AI agents must demonstrate to regulators that decisions are explainable and human oversight is genuine, not performative.

Architectural responses

Tiered data access: agents should never have standing access to the full regulated data environment. Access granted per-task, per-data-class, with explicit justification.
Deterministic audit wrappers: even if the agent's internal reasoning is non-deterministic, every externally visible action can be logged in a tamper-evident audit record.
Human-in-the-loop at regulatory boundaries: any agent action crossing a regulatory threshold should require explicit human approval, not just logging.

Key frameworks

HIPAA audit trails ↗SOX IT controls ↗SR 11-7 ↗FDA SaMD guidance ↗NERC CIP ↗GDPR Art. 22 ↗

References & Further Reading

[1]GDPR Article 22 — Automated Individual Decision-Makinggdpr.eu
[2]HIPAA Security Rule — Audit Controls (HHS)hhs.gov
[3]PCAOB AS 2201 — IT General Controlspcaobus.org
[4]SR 11-7: Model Risk Management Guidance — Federal Reservefederalreserve.gov
[5]NERC Critical Infrastructure Protection Standardsnerc.com
[6]FDA AI/ML Software as a Medical Device Action Planfda.gov

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
Healthcare (HIPAA)	HIPAA-compliant AI infrastructure	Azure Health Data Services (HIPAA BAA)	AWS HIPAA-eligible services	Google Cloud Healthcare API	OpenMRS / HAPI FHIR (on-prem)
Financial (SR 11-7)	Model risk management validation	Azure ML (model validation workflows)	Amazon SageMaker Model Monitor	Vertex AI Model Evaluation	MLflow + Great Expectations
Immutable Audit Trail	Tamper-evident logging for regulators	Azure Immutable Blob Storage + Monitor	AWS CloudTrail + S3 Object Lock	Cloud Audit Logs + GCS WORM	Hyperledger Fabric (audit chain)
Data Residency and Sovereignty	Geographic data controls	Azure Sovereign Clouds (Gov, China)	AWS GovCloud + data residency controls	Google Assured Workloads	MinIO (on-prem) / Nextcloud
Compliance Reporting	Automated evidence collection	Microsoft Purview Compliance Manager	AWS Audit Manager	Google Security Command Center	OpenSCAP / Prowler / Steampipe
Critical Infrastructure (NERC CIP)	OT/ICS AI access controls	Microsoft Defender for IoT	AWS IoT Greengrass + GuardDuty	Google Distributed Cloud Edge	Claroty / Dragos community resources

Deep Dive · Emerging Topic

Data Exfiltration as a First-Class Threat

Agents with broad data access create a fundamentally different exfiltration risk profile than traditional insider threat models. The data gravity problem: agents naturally accumulate context that may include sensitive data far beyond their task scope.

Why agents change the exfiltration calculus

Traditional exfiltration requires a human actor to deliberately seek out and move data. An agent can be instructed to exfiltrate through a prompt injection with no human intent involved.
Agents accumulate context across tool calls — an agent tasked with summarizing emails may incidentally read sensitive financial or strategic content that was never the intended target.
The blast radius of a single compromised agent session can be enormous: an enterprise agent with broad CRM, email, and file system access touches more sensitive data in one task than a typical employee accesses in a week.

Exfiltration channels unique to agents

Output exfiltration: an injected instruction causes the agent to include sensitive data in its visible output, which may then be sent to an attacker-controlled destination.
Covert channel via tool calls: an agent makes API calls to an external service using sensitive data as parameters — disguising exfiltration as legitimate tool use.
Steganographic exfiltration: demonstrated in research — sensitive data encoded into seemingly innocuous agent output that can be decoded by an attacker-controlled receiver.
Memory store exfiltration: an agent writes sensitive data to a shared memory store the attacker has read access to, without triggering traditional DLP controls.

Defenses

Output inspection: apply DLP-style pattern matching to agent outputs before they are returned to users or written to external systems.
Tool call parameter auditing: log and inspect the parameters of every outbound tool call, not just the fact that a call was made.
Context scoping: limit what data enters the agent's context window. Retrieval systems should enforce need-to-know at the retrieval layer.
Egress filtering: agent infrastructure should have explicit egress allowlists — outbound calls only to approved endpoints with anomaly detection for new destinations.

Key frameworks

DLP for agent outputs ↗Tool call parameter auditing ↗Context scoping ↗Egress allowlisting ↗Data gravity ↗

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
DLP for Agent Outputs	Output content inspection	Microsoft Purview DLP + AI Content Safety	Amazon Macie + Bedrock Guardrails	Google DLP API + Vertex Content Safety	Presidio (Microsoft OSS) / LLM Guard
Egress Filtering	Outbound network allowlisting	Azure Firewall + Private Endpoints	AWS Network Firewall + VPC Endpoints	Google VPC Service Controls + Cloud Armor	Cilium / OPA network policy
Tool Call Auditing	Log all outbound API parameters	Azure APIM + Monitor (request logging)	AWS API Gateway Access Logging + CloudTrail	Apigee Analytics + Cloud Audit Logs	OpenTelemetry / Tyk
Context Scoping / Need-to-Know	Retrieval-layer access control	Azure AI Search (security trimming)	Amazon Kendra (ACL-based retrieval)	Vertex AI Search (IAM-based)	Weaviate RBAC / Qdrant filtering
Secrets Detection	Scan outputs for leaked credentials	Microsoft Defender for Cloud (secret scanning)	Amazon CodeGuru Reviewer + Macie	Google Cloud Secret Manager + DLP	TruffleHog / Gitleaks / Detect-secrets

Deep Dive · Emerging Topic

Trust Hierarchies in Multi-Agent Systems

When agents spawn other agents, or when a human delegates to an agent that delegates further, trust propagation becomes a first-class security problem requiring runtime verification, not just design-time policy.

The delegation semantics problem

In multi-agent systems, delegation semantics are poorly defined:

Does a sub-agent inherit the full permissions of the orchestrating agent, a subset, or only task-specific permissions?
Can a sub-agent further delegate? To whom? With what constraints?
If a sub-agent's action is harmful, is the orchestrating agent that spawned it accountable?

None of these questions have standardized answers in current agentic frameworks.

The attenuation principle

Trust should attenuate, not amplify, at every delegation hop.

An orchestrating agent should only be able to grant a sub-agent a subset of its own permissions — never more.
Each delegation hop should carry a signed context header proving the chain of authority: who originated the task, what permissions were granted at each step, what constraints apply.
Sub-agents should verify the legitimacy of instructions they receive — not just trust that because a message arrived in the right channel it must be legitimate.

Runtime trust verification

Instruction provenance checking: before executing an instruction from another agent, verify that the instruction source has the authority to issue it.
Capability tokens: rather than inheriting ambient permissions, sub-agents receive explicit capability tokens for specific actions they are authorized to take on the current task.
Anomaly detection on agent-to-agent traffic: instructions that deviate from established patterns or request permissions outside the task envelope should trigger alerts.

The who-told-you-to-do-this problem

When an incident occurs in a multi-agent system, reconstructing the instruction chain is essential for accountability. This requires every agent-to-agent instruction to be logged with full provenance — source identity, timestamp, task context, and permissions granted. Without this, incident investigation becomes intractable.

Key frameworks

Attenuation principle ↗Capability tokens ↗Signed instruction provenance ↗Agent-to-agent anomaly detection ↗Delegation semantics ↗

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
Agent Identity and Attestation	Workload identity per agent	Microsoft Entra Agent ID	AWS IAM Roles + OIDC (per agent)	Workload Identity Federation	SPIFFE / SPIRE
Capability Tokens / Scoped Auth	Task-scoped token issuance	Microsoft Entra (scoped OAuth tokens)	AWS STS AssumeRole (scoped)	Google IAM service account impersonation	HashiCorp Vault (AppRole / entity tokens)
Agent-to-Agent Communication	Signed inter-agent messages	Azure Service Bus + Entra managed identity	Amazon SQS + IAM signed requests	Google Pub/Sub + Workload Identity	SPIFFE SVID + mTLS / NATS with JWTs
Delegation Policy Enforcement	Attenuation at each hop	Azure APIM policies + Entra token claims	AWS IAM permission boundaries	Google IAM conditions + org policy	Open Policy Agent (delegation rules)
Multi-Agent Orchestration	Agent workflow with trust controls	Azure AI Foundry Agent Service	Amazon Bedrock Multi-Agent Collaboration	Vertex AI Agent Builder	LangGraph / AutoGen / CrewAI

Deep Dive · Emerging Topic

Secure Prompt & System Design

Prompt engineering is a security discipline. How you structure a system prompt materially affects an agent's attack surface — and vague or overly permissive system prompts are a security liability practitioners can address immediately.

The instruction hierarchy

Most production LLM systems have an implicit instruction hierarchy: system prompt > user message > tool output. Security posture depends on how strictly this hierarchy is enforced:

A model that allows user messages to override system prompt constraints is fundamentally insecure for agentic deployments.
A model that treats tool output as having the same trust level as system prompt instructions is vulnerable to tool-based injection.
Explicit, well-defined hierarchy — and a model that reliably respects it — is a prerequisite for production-grade agent security.

System prompt hardening principles

Explicit scope definition: state clearly what the agent is and is not permitted to do. Vague permissions are exploitable; specific permissions are defensible.
Explicit denial of override: include instructions that the system prompt cannot be overridden by user messages or externally retrieved content. This raises the bar for injection attacks.
Data-instruction separation: use structural markers (XML tags, delimiters) to distinguish between content to process as data versus instructions to follow.
Minimal capability declaration: only describe capabilities the agent actually needs. Enumerating broad capabilities the agent could theoretically use invites an attacker to invoke them.

Prompt as an audit artifact

System prompts should be treated as security-critical configuration artifacts — versioned, reviewed, and subject to change management. A prompt change that expands agent permissions should require the same scrutiny as a firewall rule change. In practice, most organizations have no formal process for prompt review or approval.

The meta-prompt attack surface

In multi-agent systems, orchestrating agents often generate system prompts for sub-agents dynamically. If an attacker can influence what the orchestrating agent writes into a sub-agent's system prompt, they can shape the sub-agent's security posture. Dynamic prompt generation should be treated with the same rigor as any code that touches security boundaries.

Key frameworks

Instruction hierarchy ↗Prompt hardening ↗Data-instruction separation ↗Prompt version control ↗Meta-prompt attack surface ↗

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
Prompt Management and Versioning	System prompt version control	Azure AI Foundry (prompt management)	Amazon Bedrock Prompt Management	Vertex AI Prompt Optimizer	Langfuse / PromptFlow / Promptly
Prompt Injection Hardening	Input sanitization and content safety	Azure AI Content Safety (Prompt Shields)	Amazon Bedrock Guardrails	Vertex AI Model Armor	LLM Guard / NeMo Guardrails
Instruction Hierarchy Enforcement	System > user > tool trust levels	Azure OpenAI system prompt controls	Amazon Bedrock system prompt config	Gemini system instruction API	LangChain (message type hierarchy)
Meta-Prompt Security	Secure dynamic prompt generation	Azure APIM (prompt templating + validation)	AWS Step Functions (prompt pipeline)	Vertex AI Pipelines	Guidance (Microsoft OSS) / LMQL
Prompt Audit Logging	Log all prompts as config artifacts	Azure Monitor + AI Foundry logging	Amazon Bedrock invocation logging	Vertex AI logging	Langfuse / MLflow prompt tracking

Deep Dive · Emerging Topic

Observability & Detection Engineering for AI

The SOC does not know how to write detection rules for agent behavior yet. What does anomaly detection look like when the normal behavior of an agent is inherently variable? This bridges AI security into the existing SecOps world in a concrete way.

Why traditional detection breaks

Traditional SIEM rules are pattern-based — they look for specific sequences of events. Agent behavior produces variable event sequences for the same logical task, making static rules ineffective.
The signal-to-noise problem is acute: agents generate high volumes of tool calls, API requests, and data accesses that are indistinguishable from legitimate behavior without semantic context.
Traditional UEBA assumes a stable behavioral baseline. Agents do not have one — their behavior varies with every task and context window.

Building a behavioral baseline for agents

Task-conditioned baselines: maintain baselines per task type, not per agent. An email summarization agent has a different expected behavior profile than a data analysis agent, even if they are the same underlying model.
Trajectory analysis: analyze the sequence and composition of tool calls across a task. A task that starts with reading emails and ends with making outbound API calls to an unknown endpoint is anomalous regardless of whether each individual action looks benign.
Privilege usage analytics: track which permissions each agent actually uses versus which it holds. Sudden use of previously unused permissions warrants investigation.

What to instrument

Every tool call: name, parameters, response, latency, success/failure
Data access events: what was read, at what sensitivity level, in what task context
Output events: what was written, to where, of what size and type
Agent spawning events: what sub-agents were created, with what permissions
External network calls: destination, protocol, payload size, frequency

Detection patterns that work now

Exfiltration volume anomalies: agent reads significantly more data than it writes output — a signal that data may be accumulating in context for extraction.
Permission boundary probing: multiple failed permission checks in rapid succession suggests something is exploring the permission landscape.
Instruction-action mismatch: agent's actions are inconsistent with its declared task — increasingly tractable with LLM-based detection.
Novel endpoint calls: agent makes calls to endpoints not seen in its task history — high-fidelity signal when combined with egress allowlisting.

Key frameworks

Task-conditioned baselines ↗Trajectory analysis ↗Agent behavior analytics ↗LLM-based detection ↗Privilege usage analytics ↗

References & Further Reading

[1]User and Entity Behavior Analytics — NIST Glossarycsrc.nist.gov

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
LLM / Agent Tracing	End-to-end agent call tracing	Azure AI Foundry Tracing + App Insights	Amazon Bedrock CloudWatch + X-Ray	Vertex AI Experiments + Cloud Trace	LangSmith / Langfuse / Arize Phoenix
Behavioral Anomaly Detection	Task-conditioned behavior monitoring	Microsoft Sentinel UEBA + AI analytics	Amazon GuardDuty ML + Detective	Google Security Operations (UEBA)	Elastic SIEM / Falco / Prometheus
SIEM Integration	Agent events to SIEM pipeline	Microsoft Sentinel (native connectors)	Amazon Security Lake + OpenSearch	Google Chronicle SIEM	OpenSearch Security Analytics / Wazuh
Metrics and Dashboards	Agent performance and security dashboards	Azure Monitor Dashboards + Workbooks	Amazon CloudWatch Dashboards	Google Cloud Monitoring + Looker	Grafana + Prometheus
Privilege Usage Analytics	Track permissions used vs granted	Microsoft Entra Access Reviews	AWS IAM Access Analyzer	Google IAM Recommender	Cloudsplaining / Cartography

Deep Dive · Emerging Topic

Security Testing Methodology for Agentic AI

Red-teaming and pentesting as applied to agentic systems is a nascent field. MITRE ATLAS and OWASP LLM Top 10 give frameworks but not methodology. The gap between having a framework and knowing how to test against it is significant.

Why traditional pentest methodology falls short

Traditional pentests target deterministic systems. Agentic AI requires probabilistic testing — an attack that fails 9 times out of 10 is still viable if the consequences of the 10th success are severe enough.
The attack surface is dynamic: an agent's effective attack surface changes with every task, context window, and set of tools it is given access to.
Traditional test scoping does not translate cleanly when the application can spawn sub-agents, call arbitrary APIs, and read from data sources not known at test design time.

Building an AI threat model

An AI-specific threat model should enumerate:

Trust boundaries: where does the agent receive inputs from untrusted sources? What are all the data channels into the agent's context window?
Action inventory: what is every action the agent can take? What are the worst-case consequences of each?
Delegation graph: what agents can this agent spawn or instruct? What agents can instruct this agent?
Data sensitivity map: what is the most sensitive data the agent can access, and what are the paths by which it could be extracted?

Testing approaches that work

Injection corpus testing: build a library of injection payloads and test them systematically against every data input channel.
Permission boundary testing: attempt to get the agent to take actions outside its defined scope through instruction, social engineering, and multi-step manipulation.
Multi-turn adversarial scenarios: test for Crescendo-style attacks — sequences of individually benign interactions that cumulatively steer toward harmful behavior.
Sub-agent injection: in multi-agent systems, test whether a compromised sub-agent can influence the behavior of the orchestrating agent.
Exfiltration path testing: attempt to cause the agent to include sensitive data in outputs, tool call parameters, or written artifacts using all available injection vectors.

Key frameworks

MITRE ATLAS ↗OWASP LLM Top 10 ↗AI threat modeling ↗Injection corpus ↗Probabilistic testing ↗Multi-turn adversarial ↗

References & Further Reading

[1]OWASP Top 10 for LLM Applicationsowasp.org
[2]MITRE ATLAS: Adversarial Threat Landscape for AI Systemsatlas.mitre.org
[3]The Crescendo Multi-Turn Attack — Microsoft AI Red Team (arXiv:2404.01833)arxiv.org

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
AI Red Teaming	Automated adversarial testing	Azure PyRIT	Amazon Bedrock model evaluation	Google DeepMind safety evaluation	Garak / PromptBench / PyRIT (OSS)
Threat Modeling	AI-specific threat modeling	Microsoft Threat Modeling Tool	AWS Threat Composer	Google Cloud Security threat modeling	MITRE ATLAS / OWASP Threat Dragon
Injection Corpus Testing	Systematic prompt injection testing	Azure AI Content Safety evaluation	Amazon Bedrock prompt evaluation	Vertex AI evaluation SDK	Garak / PromptInject / HarmBench
Multi-Turn Adversarial Testing	Crescendo-style scenario testing	Azure PyRIT (multi-turn orchestration)	Amazon Bedrock automated red teaming	Google AI Safety red team tools	Garak (multi-turn probes) / CyberSecEval
Compliance Benchmark Testing	Safety benchmark evaluation	Azure AI Foundry evaluations	Amazon Bedrock model evaluation jobs	Vertex AI model eval (safety metrics)	EleutherAI LM Eval Harness / HELM

Deep Dive · Emerging Topic

Insider Threat in the Age of AI Agents

Agents acting on behalf of users dramatically amplify insider threat scenarios. A malicious insider no longer needs direct system access — they can craft inputs to an agent that cause it to act on their behalf while obscuring their intent.

The amplification problem

A traditional insider threat is bounded by the individual's own access rights and the manual effort required to exfiltrate data. An agent multiplies both: the agent may have broader access than the user, and the user can instruct the agent to perform in minutes what would take hours manually.
The agent acts as a force multiplier and an abstraction layer simultaneously — the insider's actions are mediated by the agent, making attribution harder.
Plausible deniability: “I didn't do that, the agent did” becomes a defense that may be difficult to refute without detailed instruction logs.

New insider threat patterns

Instruction laundering: a malicious insider crafts prompts that instruct the agent to perform actions the insider could not do directly — bypassing access controls that do not apply to the agent.
Scope creep exploitation: agents operating with broad permissions can be instructed to access data far outside the nominal task scope.
Timing attacks: instructing an agent to perform sensitive actions at times when oversight is reduced — after hours, during high-volume periods when alerts are more likely to be buried.
Credential harvesting via agent: instructing an agent to access systems and capture credentials or session tokens in its output, which the insider then uses for direct access.

Detection and mitigation

Instruction auditing: the human's instructions to the agent must be logged with the same rigor as the agent's actions. Without this, you can see what the agent did but not why.
Behavioral correlation: correlate agent behavior with the instructing user's historical patterns. An agent suddenly accessing data categories the user has never previously touched is a signal worth investigating.
Dual-approval for high-sensitivity tasks: tasks involving highly sensitive data should require a second human to authorize the agent's instructions, not just the agent's actions.

Key frameworks

Instruction laundering ↗Instruction auditing ↗Behavioral correlation ↗Dual-approval patterns ↗UEBA for agent instructions ↗

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
Instruction Auditing	Log human to agent instructions	Azure Monitor + AI Foundry logging	Amazon Bedrock invocation logs + CloudTrail	Vertex AI logging + Cloud Audit Logs	Langfuse / OpenTelemetry
User Behavior Analytics	Correlate user and agent behavior	Microsoft Sentinel UEBA	Amazon GuardDuty + Detective	Google Security Operations (UEBA)	Elastic SIEM / Wazuh
Privileged Access Management	Control what users can instruct agents to do	Microsoft Entra PIM	AWS IAM + Access Analyzer	Google BeyondCorp PAM	CyberArk Conjur (OSS) / HashiCorp Vault
Data Loss Prevention	Detect sensitive data in agent instructions	Microsoft Purview DLP	Amazon Macie + Comprehend	Google DLP API	Presidio / OpenDLP
Dual Approval Workflows	Multi-person auth for sensitive agent tasks	Microsoft Entra Verified ID + Approvals	AWS IAM MFA conditions + Access Approval	Google Access Approval	OPA (multi-approver policy) / Teleport

Deep Dive · Emerging Topic

Cross-Organizational Agent Interaction

Enterprises are beginning to expose agent-to-agent APIs — one company's agent calling another's. This creates inter-organizational trust problems that no existing security framework handles well.

The inter-org trust gap

When two organizations' agents interact, they bring different:

Security policies and guardrails that may be incompatible or in conflict
Data classification schemes — what one organization considers public another may consider confidential
Permission models — an agent authorized by Organization A may be performing actions on systems owned by Organization B
Incident response capabilities — if something goes wrong, which organization's IR team responds, and do they have visibility into the other's agent behavior?

Liability and contractual gaps

Existing SLAs and data processing agreements were not written with autonomous agent interaction in mind. An agent that calls a partner's API and causes an incident falls outside most contractual frameworks for fault allocation.
The question of who is responsible when Agent A (Org A) instructs Agent B (Org B) to take an action that causes harm is genuinely unresolved in current legal frameworks.
Data residency and sovereignty requirements may be violated when agents route data through cross-organizational interactions not anticipated in the original compliance architecture.

Emerging patterns

Agent API contracts: formal specifications of what actions a partner's agent is permitted to request, what data it will receive, and what audit information will be shared in case of incidents.
Mutual attestation: before an agent-to-agent interaction, both agents exchange signed attestations of their current permission scope, security posture, and data handling commitments.
Federated audit logs: cross-organizational agent interactions should produce audit records that both organizations can access, enabling joint incident investigation.
Sandboxed interaction zones: cross-org agent interactions should occur in isolated environments that prevent the partner agent from accessing internal systems not explicitly shared.

Key frameworks

Agent API contracts ↗Mutual attestation ↗Federated audit logs ↗Sandboxed interaction zones ↗Cross-org liability frameworks ↗

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
Cross-Org Agent Authentication	Inter-organization workload identity	Microsoft Entra B2B + Workload Identity	AWS IAM cross-account + OIDC	Workload Identity Federation (cross-org)	SPIFFE Federation / SPIRE
API Security and Contracts	Agent API access control	Azure APIM (cross-tenant policies)	AWS API Gateway + resource-based policies	Apigee (cross-org API management)	Kong Gateway / Tyk
Mutual Attestation	Runtime security posture exchange	Microsoft Entra Verified ID	AWS Artifact + OIDC attestation	Google Assured Workloads attestation	SPIFFE SVID + mTLS
Federated Audit Logs	Shared cross-org audit trail	Azure Monitor (cross-tenant queries)	AWS CloudTrail + S3 cross-account	Cloud Audit Logs + Log Sink (cross-project)	OpenTelemetry (federated collector)
Data Classification and Sharing	Controlled data exchange	Microsoft Purview (cross-tenant sharing)	AWS Lake Formation cross-account	Google Analytics Hub	Apache Atlas / OpenMetadata

Deep Dive · Emerging Topic

The Ghost Agent Problem

Agents that were deployed, forgotten, and are still running with stale credentials and outdated models. The lifecycle management problem is already a real operational issue at early-adopter organizations and will become a major audit finding category.

How ghost agents emerge

An agent is deployed for a specific project. The project ends but the agent — along with its credentials, permissions, and integrations — is never formally decommissioned.
Team turnover: the person who deployed the agent leaves, and no one else has visibility into its existence or operation.
Shadow AI deployment: an individual team deploys an agent without formal IT or security involvement. When that individual leaves, the agent becomes invisible to governance processes.
Automated agent spawning: in complex multi-agent systems, sub-agents may be created dynamically and persist beyond the lifecycle of the task that created them.

Why ghost agents are a serious risk

Stale credentials: ghost agents often hold long-lived credentials that were never rotated. If those credentials are compromised, the attacker gains access with no one actively monitoring the agent's behavior.
Outdated models: a ghost agent running an older model version may lack safety mitigations introduced in subsequent versions, making it more vulnerable to known attack patterns.
Unpatched dependencies: the orchestration libraries, tool integrations, and supporting infrastructure of a ghost agent accumulate security debt with no one maintaining them.
Invisible blast radius: if a ghost agent is compromised, the organization has no incident response playbook for an agent it did not know existed.

Lifecycle governance

Agent registry as a control: every deployed agent must be registered with owner, creation date, purpose, permissions, model version, and last-reviewed date. Unregistered agents are treated as unauthorized.
Mandatory expiration: agent deployments should have explicit expiration dates requiring active renewal. Default-to-expired is safer than default-to-permanent.
Credential TTL enforcement: agent credentials should have short TTLs enforced at the infrastructure level. An agent that cannot refresh its credentials automatically decommissions itself.
Periodic access reviews: include agent access in the same periodic access review processes applied to human users.

Key frameworks

Agent registry ↗Mandatory expiration ↗Credential TTL enforcement ↗Shadow AI discovery ↗Agent access review ↗

Solutions & Platforms

Area / Topic	Name	Azure	AWS	Google	Open Source
Agent Inventory / Registry	Catalog all deployed agents	Azure AI Foundry (model/agent catalog)	Amazon SageMaker Model Registry	Vertex AI Model Registry	MLflow Registry / Backstage
Credential TTL Enforcement	Short-lived auto-expiring credentials	Microsoft Entra (token lifetime policies)	AWS STS session duration limits	Google IAM (short-lived service account keys)	HashiCorp Vault (TTL leases)
Access Reviews	Periodic agent permission reviews	Microsoft Entra Access Reviews	AWS IAM Access Analyzer + Access Advisor	Google IAM Recommender	Cartography / Cloudsplaining
Shadow AI Discovery	Detect unauthorized agent deployments	Microsoft Defender for Cloud Apps	Amazon Macie + Config	Google Security Command Center	Prowler / Steampipe
Lifecycle Automation	Auto-decommission on expiry	Azure Policy (auto-remediation)	AWS Config Rules + Lambda remediation	Google Cloud Asset Inventory + Policy	Ansible / Terraform lifecycle rules