AI Red Teaming for Product Teams Print Edition

Test LLM, RAG, and agentic systems before users and attackers do.

Course thesis

AI features fail differently than traditional software features. They can follow malicious instructions, retrieve the wrong context, expose sensitive data, call tools unsafely, overtrust generated output, or appear safe in happy-path tests while failing under adversarial conditions.

The goal is not to teach reckless exploitation. The goal is to give product teams a repeatable, public-safe, defensive workflow for finding AI product failures before release.

Audience

This course is for QA engineers, test automation engineers, product security teams, security engineers, DevOps, SecOps, SecEng, AI platform teams, AppSec teams, internal red teams, product managers, engineering managers, and governance teams.

Learning outcomes

Learners will be able to:

map AI product attack surface safely
design prompt injection and instruction-conflict tests
test RAG retrieval boundaries and data leakage risks
evaluate agent tools, permissions, approvals, and action limits
identify sensitive data exposure paths
test guardrail behavior without unsafe live abuse
build AI abuse-case libraries and prompt families
wire AI red-team regression checks into CI/CD
capture evidence, severity, and remediation notes
build an AI abuse-case test plan for product release

\pagebreak

# Module 1: AI Attack Surface for Product Teams

AI product testing starts with attack surface.

Key points

AI features are workflows, not only model calls.
Attack surface includes prompts, retrieved context, tool outputs, vector stores, logs, integrations, and output rendering.
Trust boundaries show where permissions, users, data, and responsibility change.
Safe testing requires authorized systems, synthetic data, controlled environments, and scoped evidence.
Release-relevant failures should become test cases.

Practice

Map the attack surface for a fictional customer-support RAG assistant.

\pagebreak

# Module 2: Prompt Injection and Instruction Conflicts

Prompt injection testing checks whether an AI feature follows the wrong instructions.

Key points

Prompt injection is best understood as instruction conflict.
Test user input, retrieved documents, uploaded files, tickets, web content, and tool output.
Define intended instruction hierarchy.
Retrieved content should usually be treated as data, not control instruction.
Capture expected behavior, actual behavior, evidence, and severity.

Practice

Design five safe instruction-conflict tests for a fictional RAG assistant.

\pagebreak

# Module 3: RAG Leakage and Retrieval Boundary Tests

RAG systems fail when retrieval crosses a boundary.

Key points

Grounded answers are not automatically safe.
Test tenant, workspace, role, document classification, time, and region boundaries.
Use synthetic canary phrases.
Test stale, deleted, revoked, and forbidden documents.
Capture retrieved sources, citations, output, policy decisions, and severity where approved.

Practice

Create a RAG boundary test matrix for a fictional product with two tenants, admin and standard users, public docs, internal docs, finance docs, and revoked documents.

\pagebreak

# Module 4: Agent Tool Abuse and Excessive Agency

Agents create risk because they can act.

Key points

Agent risk is what the system lets the model do.
Test tool permissions, approval gates, user roles, tenant boundaries, retries, and state changes.
Excessive agency occurs when an agent exceeds user authority, task scope, or product policy.
Meaningful approval requires action visibility.
Evidence must distinguish proposed, approved, executed, and blocked actions.

Practice

Build an agent test matrix for a fictional customer support agent.

\pagebreak

# Module 5: Sensitive Data Exposure and Output Handling

AI systems can expose sensitive data through prompts, retrieval, outputs, logs, citations, summaries, and downstream workflows.

Key points

Use synthetic markers, not real sensitive data.
Test prompts, retrieved content, model output, citations, logs, traces, analytics, exports, notifications, and tickets.
Check redaction before storage.
Severity depends on data class, reachability, affected users, logging, and tenant boundaries.

Practice

Create a sensitive data exposure test plan for an AI assistant that summarizes account history.

\pagebreak

# Module 6: Guardrail Evaluation and Regression Testing

Guardrails are useful, but they are not proof.

Key points

Guardrails are controls to test.
Test false positives and false negatives.
Check refusal quality.
Confirm guardrails still work after model, prompt, retrieval, or policy changes.
Every confirmed guardrail failure should become a regression test.

Practice

Build a guardrail regression plan for an AI assistant that must avoid revealing restricted account data.

\pagebreak

# Module 7: Test Case Libraries and Prompt Families

One-off AI tests do not scale.

Key points

A test case library turns ad hoc failures into repeatable evidence.
Prompt families group related tests by failure class.
Expected behavior, pass criteria, fail criteria, evidence, severity, and owner must be explicit.
Use stable synthetic data and canary phrases.
Test libraries need governance.

Practice

Create ten reusable test cases for a fictional AI product.

\pagebreak

# Module 8: CI/CD AI Red-Team Regression Suites

AI red-team findings should not live only in reports.

Key points

Known failures should become regression tests.
AI checks can run at pull request, pre-merge, pre-release, scheduled, and incident follow-up stages.
Deterministic checks are preferred where possible.
Model judges can help but should not be the only release control.
CI runs should produce evidence.

Practice

Design a CI/CD regression suite for a fictional RAG assistant.

\pagebreak

# Module 9: Evidence, Severity, and Remediation Backlogs

A finding is useful only if the team can understand it, reproduce it safely, prioritize it, and fix it.

Key points

Good evidence turns a failure into an engineering task.
Severity depends on impact and likelihood.
Remediation can include access control fixes, retrieval filter changes, prompt policy changes, tool permission reductions, eval additions, and regression tests.
Evidence itself must not become a sensitive data exposure.

Practice

Write a finding for a fictional RAG assistant that reveals a synthetic cross-tenant canary phrase.

\pagebreak

# Module 10: Capstone AI Abuse-Case Test Plan

The final deliverable is a release-ready AI abuse-case test plan.

Required sections

feature inventory
attack surface map
scope and authorization
synthetic test data plan
prompt injection test categories
RAG boundary test matrix
agent tool permission tests
sensitive data exposure tests
guardrail regression tests
CI/CD regression plan
evidence capture plan
severity model
remediation backlog workflow
release decision criteria

Practice

Build a release-ready AI abuse-case test plan for a product with a RAG assistant, summarization feature, agentic support workflow, model gateway, and trust center page.

\pagebreak

# Appendix A: Quick Checklists

Safe test scope

System is owned or explicitly authorized.
Environment is approved.
Data is synthetic.
No real secrets are used.
No real customer records are used.
Destructive actions are disabled or controlled.
Evidence capture is approved.
Owners are identified.

RAG boundary tests

Tenant boundary tested.
Workspace boundary tested.
Role boundary tested.
Classification boundary tested.
Revoked document tested.
Stale index tested.
Forbidden source name not revealed.
Synthetic canary phrases used.

Agent tests

Allowed tools listed.
Forbidden tools listed.
Approval gates listed.
Retry limits tested.
Tool errors tested.
Confused deputy risk tested.
Audit trail captured.

Finding checklist

Clear title.
Affected feature.
Risk category.
Scope.
Expected behavior.
Actual behavior.
Safe evidence.
Impact.
Severity.
Remediation.
Regression test.
Owner.

\pagebreak

# Appendix B: Sample Prompt Templates

AI attack surface map

Create an AI attack surface map for this feature.

Feature: [feature]

Known workflow: [workflow]

Users: [users]

Data: [data classes]

Tools: [tools]

Output locations: [outputs]

Provide:

instruction sources
retrieval sources
tool actions
trust boundaries
sensitive data locations
logs and traces
likely failure modes
recommended test categories

AI abuse-case test plan

Build a release-ready AI abuse-case test plan.

Product: [product]

AI features: [features]

Users: [users]

Data classes: [data]

Tools: [tools]

Controls: [controls]

Output:

feature inventory
attack surface map
safe scope
synthetic data plan
prompt injection tests
RAG boundary matrix
agent tool tests
sensitive data exposure tests
guardrail regression tests
CI/CD regression plan
evidence template
severity model
remediation workflow
release decision criteria

\pagebreak

# Final Message

AI red teaming for product teams is not a performance.

It is a release-readiness discipline.

Test the AI behavior before release. Capture evidence. Fix the product. Add the regression. Then ship with more confidence.