Detection Engineering for AI Systems

Slug: detection-engineering-for-ai-systems Effective Date: 2026-05-17 Version: v1.0 Author: Alex Eisen Status: Draft Minimum Target Length: 2,000 words

AI detection engineering is about making model behavior observable. If the team cannot see prompts, retrieval, tool use, and state changes, it cannot tell the difference between normal work and abuse.

Why This Matters

Traditional detections miss AI-specific abuse because the action can start in language and end in a side effect. The control gap is not only alert content. It is missing telemetry.

Core Concept

The goal is to connect prompts, outputs, retrieval, identities, tools, and approvals into one detection model. If the model can act, the SOC needs to see the action path.

Threat Model or Failure Model

A prompt injection changes a tool call.
The agent accesses data it should not have seen.
The system emits a useful answer but hides the path that produced it.
Cost spikes or unusual sequencing signal abuse before obvious damage.

Framework Mapping

Use the same ideas that drive SIEM and incident response, then add AI-specific context from OWASP, ATLAS, and the NIST AI RMF. The point is not new jargon. It is better visibility.

Engineering Controls

Log prompts, retrievals, tool calls, and approvals.
Correlate model versions with behavior changes.
Create alerts for suspicious sequencing and unusual data access.
Define a response path for model abuse and agent misuse.

Tooling

Use trace stores, SIEM pipelines, and evaluation logs.
Keep the event schema stable enough for replay and triage.
Separate noisy status signals from real security events.

Evidence and Observability

Evidence should show what was seen, what was blocked, and what was alerted.
Keep the trace and the alert together.
Use dashboards as context, not proof.

Operating Model

SOC, platform engineering, and product security need a shared event model. If the team cannot tell which prompt led to which action, the detection program is blind at the wrong layer.

Common Mistakes

Logging only the final output.
Alerting on everything and understanding nothing.
Ignoring retrieval and tool context.
Treating dashboards as evidence.

Practical Example

A code assistant begins calling a storage tool after receiving a document that instructs it to do so. Detection engineering should surface the prompt, the tool call, and the policy decision that should have blocked it.

Governance and Claim Caveats

Sponsor support does not influence methodology, scoring, findings, chart outputs, or editorial conclusions.
Job-description intelligence and public hiring signals are directional signals, not proof of internal security maturity.
Psychometric outputs are role-language evidence, not diagnosis.
Avoid accusatory company-level language.
Avoid product endorsement language.

Conclusion

AI detection engineering is what makes agent and model behavior reviewable after the fact. Without it, the team can see an incident only after the damage is done.

Implementation Checklist

Define the event schema.
Log prompts and actions.
Correlate versions.
Add abuse alerts.
Test noisy paths.
Keep replayability.
Map to SOC workflow.
Document alert ownership.
Track evidence privately.
Review the caveats.

Source Notes Needed

SIEM and detection engineering references.
NIST AI RMF.
MITRE ATLAS.
Agent logging examples.
Public AI incident writeups.

Operationalize Identity

Review Identity Governance Patterns

Explore SURFACE →

Framework Alignment

This practice is mapped to the Identity control objective within our AI security operating model.

Read Methodology →