Threat Modeling LLM Applications: Data Flows, Trust Boundaries, Tool Calls, and Abuse Cases
LLM applications break many lazy threat models. A diagram with a browser, API, database, and third-party model call is not enough. The real security behavior may depend on system prompts, retrieved content, tool descriptions, model outputs, memory, provider routing, and approval workflows.
A user prompt can become a retrieval query. A retrieved document can become model context. A model output can become an API argument. A tool result can become memory. A generated answer can become a customer-facing claim. Every transformation is a place where trust can be lost.
Threat modeling LLM applications means mapping not only where data flows, but where authority flows.
- Core Thesis
LLM threat modeling should map assets, actors, data flows, trust boundaries, prompt assembly, retrieved content, model providers, tool calls, memory, outputs, identities, approvals, logs, and abuse cases. The output should become controls, tests, telemetry requirements, and incident-response assumptions.
This article is written for security architects, AppSec teams, AI red teamers, platform engineers, product security leaders, and technical buyers who need AI security work to produce more than dramatic examples. The goal is to make testing, threat modeling, reporting, and remediation repeatable.
AI security work becomes credible when it can be scoped, reproduced, evidenced, and fixed. That requires structure: authorized targets, safe data, clear attack classes, reliable logs, report templates, severity criteria, and retest procedures.
- Why This Matters
Threat modeling and LLM application security matters because AI systems are now appearing in workflows where weak testing or vague reporting can create real business risk. A finding that cannot be reproduced wastes engineering time. A test that uses unsafe data creates new exposure. A threat model that ignores tool calls misses the actual blast radius. A report that exaggerates conclusions damages trust.
The mature path is to connect adversarial testing to engineering operations. Red-team labs should create reusable tests. Findings should become backlog items. Threat models should create logging requirements. Evidence should support governance claims. Retests should prove remediation.
- Failure Model
The failure model for this domain includes:
- testing without clear authorization;
- using production data when synthetic data would work;
- collecting screenshots without reproducibility;
- rating severity by model weirdness rather than impact;
- ignoring tool calls and downstream actions;
- missing retrieval and memory evidence;
- failing to preserve logs;
- recommending vague remediation;
- making unsupported executive claims;
- failing to retest.
These failures are avoidable when the security process is designed before the engagement begins.
- Start with the System Purpose
The threat model should begin with the business workflow. What is the AI system supposed to help users do? What decisions or actions does it influence? What harm would matter?
The first step is to define the purpose of the work. Is the team testing an application before launch? Validating a remediation? Building a regression suite? Running a buyer assessment? Preparing governance evidence? Investigating an incident? The answer determines scope, tools, evidence, and reporting.
A clear purpose also prevents overreach. AI security work can easily sprawl because models, data, tools, and workflows are interconnected. Scope discipline is a safety control.
- Map Assets
Assets include prompts, outputs, documents, embeddings, user records, credentials, tool results, system instructions, model configurations, logs, memory, and generated decisions.
Authorization should be explicit. If a test touches third-party systems, browser sessions, email, customer data, production APIs, cloud infrastructure, or external model providers, the team needs written boundaries. Those boundaries should define allowed targets, accounts, windows, rate limits, prohibited actions, and emergency contacts.
Good authorization protects both the organization and the testers. It also improves evidence quality because the report can distinguish tested behavior from speculation.
- Map Actors
Actors include users, administrators, tenants, external attackers, malicious insiders, third-party content authors, model providers, tool providers, and compromised accounts.
Data choice matters. Synthetic data is often enough for prompt injection, RAG poisoning, leakage simulation, unsafe output testing, and tool-call validation. Real data should be used only when necessary and approved.
Safe datasets should still be realistic. A leakage test is more useful when fake secrets, fake customers, fake account notes, and fake confidential documents resemble the structure of real materials without exposing real people or business records.
- Map Data Flows
Data flows should include user input, context assembly, retrieval, provider calls, tool calls, output rendering, logging, memory writes, and downstream actions.
Repeatability is the difference between a demo and a control. A harness should record inputs, outputs, versions, retrieved context, tool calls, policy decisions, and expected behavior. It should allow teams to rerun the test after prompt changes, model changes, tool changes, or remediation.
Manual testing still matters. Many AI failures are discovered through exploration. But once a meaningful failure is found, it should become a reproducible test.
- Map Trust Boundaries
Important boundaries include user to app, app to model provider, retrieval service to vector store, untrusted content to prompt context, model to tool broker, and output to renderer.
Evidence should tell the story of the failure without relying on trust. It should show what was asked, what the system saw, what it retrieved, what it produced, what it called, what was approved, and what happened next.
Evidence should also be minimized. Redact secrets, personal data, and unnecessary customer details. Store raw artifacts securely. Use synthetic data where possible.
- Prompt Assembly
Prompt assembly should be modeled explicitly. Which instructions are trusted? Which content is untrusted? What data can enter context? What metadata is preserved?
Severity should be based on impact and exploitability. A funny model answer is not necessarily high severity. A boring output that leaks cross-tenant data may be critical. Tool use, data sensitivity, reversibility, detection, and affected users should matter more than novelty.
The report should clearly identify assumptions and limitations. If testing occurred in staging with synthetic data, say so. If production behavior may differ, say so. Honest limitations make findings more credible.
- Tool Calls
Tool calls are authority boundaries. The threat model should identify what tools exist, what credentials they use, what actions they can take, and what validation occurs.
Remediation should be specific enough for engineering teams to act. “Improve guardrails” is weak. “Enforce authorization before retrieval and add regression tests for cross-tenant document access” is much stronger.
A remediation should identify the control, owner, expected behavior, and validation method. This turns a finding into work.
- Memory and State
Memory changes future behavior. Threat modeling should include who can write memory, what memory stores, how it expires, and whether malicious instructions can persist.
Retesting is where the loop closes. A finding should not be considered resolved only because code changed. It should be retested using the original reproduction steps and nearby variants.
For AI systems, retesting should also check for regressions. A fix for one prompt injection path may not fix indirect injection. A fix for one tool may not apply to another. A fix for one model may not hold after provider routing changes.
- Abuse Cases
Abuse cases should include direct injection, indirect injection, leakage, tool misuse, excessive agency, unsafe output, RAG poisoning, cost exhaustion, and overreliance.
The operating model should connect red team, AppSec, product, platform, SOC, GRC, and engineering. Red-team findings should inform eval suites. Eval failures should inform release gates. Threat models should inform logs. Logs should inform detections. Detections should inform incident playbooks. Incident lessons should update the lab.
This is how AI security becomes a learning system.
- Threat Model Outputs
The final output should be a backlog of controls, tests, logging requirements, approval gates, incident assumptions, and unresolved risks.
The strongest AI security programs treat every assessment as both a point-in-time review and a source of reusable evidence. Payloads, test cases, findings, control mappings, and retest records should improve the next review.
This approach also supports claim-readiness. If the organization says it tests prompt injection or monitors agent tools, it should be able to show the evidence.
- Practical Example
A legal document assistant appears simple: user uploads a contract and asks for a summary. The threat model reveals more: the document may contain indirect instructions, the model output may create legal-sounding claims, summaries are logged, files are stored, and users may paste sensitive terms. Controls include untrusted-content labeling, no legal-advice framing, output caveats, log retention limits, and source citation requirements.
This example shows why structure matters. The same technical behavior can be a weak anecdote or a strong finding depending on scope, evidence, impact, and remediation.
- Tooling Guidance
Relevant tools may include red-team harnesses, prompt eval tools, proxy tools, browser automation frameworks, observability systems, SIEMs, test data generators, and reporting templates. Examples may include PyRIT, garak, promptfoo, Giskard, DeepEval, Ragas, Burp Suite, Playwright, OpenTelemetry, Langfuse, LangSmith, and custom harnesses.
Tool mentions are not endorsements. Tools should be evaluated by whether they support safe scope, repeatability, evidence, integration, and remediation.
- Governance and Trust Caveats
Sponsor support does not influence methodology, scoring, findings, chart outputs, or editorial conclusions.
Job-description intelligence and public hiring signals are directional signals, not proof of internal security maturity.
Psychometric outputs are role-language evidence, not diagnosis.
Avoid accusatory company-level language. Avoid product endorsement language. Use careful phrases such as directional signal, aggregate benchmark, claim-readiness, governance evidence, private benchmark, skills validation, and operating model.
-
Implementation Controls
-
Diagram prompt, retrieval, model, tool, memory, and output flows.
-
Identify trusted and untrusted content sources.
-
Map model-visible data by classification.
-
Map model-influenced actions by blast radius.
-
Document tool credentials and authorization checks.
-
Define abuse cases for each trust boundary.
-
Turn threats into controls and evals.
-
Define telemetry required for incident reconstruction.
-
Document unresolved assumptions and accepted risks.
-
Revisit the threat model after material changes.
-
Common Mistakes
Common mistakes include:
-
treating jailbreaks as complete findings;
-
skipping written authorization;
-
using unsafe production data;
-
ignoring tool calls and retrieval traces;
-
failing to record model and prompt versions;
-
rating severity without business impact;
-
recommending vague guardrail fixes;
-
failing to create regression tests;
-
losing evidence needed for governance;
-
overstating what the test proves.
-
Conclusion
Threat Modeling LLM Applications: Data Flows, Trust Boundaries, Tool Calls, and Abuse Cases is about making AI security work useful. The best teams do not merely discover that AI systems can fail. They discover specific failures, explain why they matter, fix the underlying controls, and prove the fix works.
That is the difference between AI security content and AI Security Engineering.
Implementation Checklist
- Diagram prompt, retrieval, model, tool, memory, and output flows.
- Identify trusted and untrusted content sources.
- Map model-visible data by classification.
- Map model-influenced actions by blast radius.
- Document tool credentials and authorization checks.
- Define abuse cases for each trust boundary.
- Turn threats into controls and evals.
- Define telemetry required for incident reconstruction.
- Document unresolved assumptions and accepted risks.
- Revisit the threat model after material changes.
- Define scope, authorization, and safety rules before testing.
- Preserve evidence needed for reproduction and remediation.
- Turn meaningful findings into regression tests.
- Retest after remediation.
- Store results as governance evidence.
Source Notes Needed
- OWASP Top 10 for LLM Applications.
- NIST AI Risk Management Framework.
- MITRE ATLAS.
- STRIDE references to verify.
- Data-flow diagramming guidance.
Framework Alignment
This practice is mapped to the Identity control objective within our AI security operating model.
Read Methodology →