Detection Engineering for AI Systems
Slug: detection-engineering-for-ai-systems Effective Date: 2026-05-17 Version: v1.0 Author: Alex Eisen Status: Draft Minimum Target Length: 2,000 words
AI detection engineering is about making model behavior observable. If the team cannot see prompts, retrieval, tool use, and state changes, it cannot tell the difference between normal work and abuse.
- Why This Matters
Traditional detections miss AI-specific abuse because the action can start in language and end in a side effect. The control gap is not only alert content. It is missing telemetry.
- Core Concept
The goal is to connect prompts, outputs, retrieval, identities, tools, and approvals into one detection model. If the model can act, the SOC needs to see the action path.
- Threat Model or Failure Model
- A prompt injection changes a tool call.
- The agent accesses data it should not have seen.
- The system emits a useful answer but hides the path that produced it.
- Cost spikes or unusual sequencing signal abuse before obvious damage.
- Framework Mapping
Use the same ideas that drive SIEM and incident response, then add AI-specific context from OWASP, ATLAS, and the NIST AI RMF. The point is not new jargon. It is better visibility.
- Engineering Controls
- Log prompts, retrievals, tool calls, and approvals.
- Correlate model versions with behavior changes.
- Create alerts for suspicious sequencing and unusual data access.
- Define a response path for model abuse and agent misuse.
- Tooling
- Use trace stores, SIEM pipelines, and evaluation logs.
- Keep the event schema stable enough for replay and triage.
- Separate noisy status signals from real security events.
- Evidence and Observability
- Evidence should show what was seen, what was blocked, and what was alerted.
- Keep the trace and the alert together.
- Use dashboards as context, not proof.
- Operating Model
SOC, platform engineering, and product security need a shared event model. If the team cannot tell which prompt led to which action, the detection program is blind at the wrong layer.
- Common Mistakes
- Logging only the final output.
- Alerting on everything and understanding nothing.
- Ignoring retrieval and tool context.
- Treating dashboards as evidence.
- Practical Example
A code assistant begins calling a storage tool after receiving a document that instructs it to do so. Detection engineering should surface the prompt, the tool call, and the policy decision that should have blocked it.
- Governance and Claim Caveats
- Sponsor support does not influence methodology, scoring, findings, chart outputs, or editorial conclusions.
- Job-description intelligence and public hiring signals are directional signals, not proof of internal security maturity.
- Psychometric outputs are role-language evidence, not diagnosis.
- Avoid accusatory company-level language.
- Avoid product endorsement language.
- Conclusion
AI detection engineering is what makes agent and model behavior reviewable after the fact. Without it, the team can see an incident only after the damage is done.
Implementation Checklist
- Define the event schema.
- Log prompts and actions.
- Correlate versions.
- Add abuse alerts.
- Test noisy paths.
- Keep replayability.
- Map to SOC workflow.
- Document alert ownership.
- Track evidence privately.
- Review the caveats.
Source Notes Needed
- SIEM and detection engineering references.
- NIST AI RMF.
- MITRE ATLAS.
- Agent logging examples.
- Public AI incident writeups.
Framework Alignment
This practice is mapped to the Identity control objective within our AI security operating model.
Read Methodology →