ConsultingWorkbench-backed AI security engagements — map, attack, defend, and prove your AI systems.
Scope a Review

aisecurity.llc

AIPSA Scorecard — AI Product Security Scorecard

A product-security baseline for AI systems that ship, log, and need evidence.

AI Product Security Scorecard

See whether your AI security program is real, repeatable, and evidence-producing.

A product baseline for AI-enabled SaaS, RAG systems, agents, tool-calling workflows, and enterprise AI products that need evidence, not theater.

CISOs
Product security leaders
AppSec teams
AI platform teams
Security architects
Governance and risk teams
B2B SaaS founders
Enterprise sales engineering teams

Product snapshot

AI Product Security Scorecard

A product-security scorecard for assessing whether AI-enabled systems are inventoried, threat-modeled, permissioned, tested, logged, governed, and supported by evidence.

2026.1.0
Default standard
6 score bands
Overall score
Raw score and capped score
Domain score heatmap
Top gaps
Evidence gaps
30/60/90 day roadmap

Board-style summary

The program has moved beyond ad hoc AI security, but it is not yet fully measured. The priority is to reduce time-to-evidence across retrieval, tool use, agent actions, and incident response. Management should fund the 90-day remediation plan and require a reassessment after evidence gaps are closed.

Overview

A product-grade scorecard with a clear path to action

This surface keeps the useful content, the scoring vocabulary, and the remediation shape. It avoids importing the prototype's full app shell, export stack, or local-storage history into the main repo.

Domains
12

Challenge coverage areas.

Capabilities
36

Operational skills and controls.

Standard questions
72

The main challenge signal.

Roadmap actions
36

Remediation items with owners.

What it is for

Product-security first

The scorecard focuses on AI features as shipped product systems: prompts, retrieval, tools, agents, permissions, logs, incidents, and customer evidence.

Evidence-oriented

Every score asks what can be proven: diagrams, evals, logs, runbooks, risk decisions, owner records, and customer-ready artifacts.

Roadmap-ready

Low scores generate a practical 30/60/90 day remediation plan with owners, acceptance criteria, and evidence produced.

Framework-aware

The scorecard crosswalks to NIST AI RMF, OWASP AI maturity work, CSA AI security maturity, ISO 42001, OWASP LLM Top 10, and MITRE ATLAS without pretending to certify compliance.

Default caveat

This is a product baseline. It helps teams identify gaps, prioritize remediation, and prepare evidence. It is not a certification and does not assert compliance by itself.

Formal certification
Formal audit opinion
Replacement for legal review
Replacement for privacy impact assessment
Replacement for model safety evaluation in regulated high-risk AI systems

Packages

Assessment packages and score bands

This scorecard keeps the depth deliberate: Lite for fast signal, Standard for most teams, Deep for workshops and paid engagements.

lite

Lite

36 q

A fast executive scan for lead generation, first-pass discovery, or a quick posture conversation.

12 min
36 target

Scorecard, top gaps, first roadmap, and high-level framework crosswalk.

standard

Standard

72 q

A serious company baseline for product, security, governance, and engineering leaders.

35 min
84 target

Domain heatmap, capability scores, prioritized gaps, evidence requests, roadmap, and framework crosswalk.

deep

Deep

132 q

A comprehensive evidence-oriented challenge for workshops, paid engagements, and AI product security program design.

75 min
144 target

Full maturity report, evidence gap analysis, control roadmap, framework crosswalk, and implementation backlog.

Level 0

Unseen

0.00–0.74

AI use exists, but the organization cannot reliably inventory, govern, test, or observe it.

Evidence is absent, anecdotal, or unavailable during a customer, incident, or executive review.

Discovery happens after incidents, customer questions, or urgent launch pressure.

Level 1

Ad Hoc

0.75–1.74

Some AI security activity exists, usually hero-driven, inconsistent, and detached from normal product delivery.

Evidence exists in scattered documents, tickets, Slack threads, screenshots, or individual memory.

Security review depends on who happens to be involved.

Level 2

Repeatable

1.75–2.74

Core practices exist for important systems, but coverage is incomplete and evidence is still mostly manual.

Evidence can be assembled, but not quickly, consistently, or across the whole AI surface.

Teams have templates and review rituals, but adoption varies by product area.

Level 3

Managed

2.75–3.74

AI product security controls are embedded into SDLC, architecture review, launch review, logging, and risk acceptance.

Evidence is attached to the way products are built and launched.

AI security is on rails for most production launches.

Level 4

Measured

3.75–4.49

Controls produce measurable evidence, test results, coverage metrics, and executive-ready reporting.

Evidence is queryable, repeatable, and tied to metrics, owners, systems, and risk decisions.

AI product security posture is visible across domains and improves through measurement.

Level 5

Adaptive

4.50–5.00

AI security is continuously monitored, red-teamed, updated, and improved as models, tools, agents, data, and abuse patterns change.

Evidence is continuous, regression-tested, incident-informed, and customer-ready.

Controls evolve with adversary behavior, architecture changes, and business expansion.

Scoring rules

How the scorecard keeps claims bounded

Capped score, not vanity score

Unseen

0.00 - 0.74

Ad Hoc

0.75 - 1.74

Repeatable

1.75 - 2.74

Managed

2.75 - 3.74

Measured

3.75 - 4.49

Adaptive

4.50 - 5.00

Domains

Coverage areas, not a pile of isolated questions

The scorecard is organized around product-security domains, each with capabilities and question volume behind it. The cards below show the shape of the product without exposing the full question bank here.

Inventory

AI Inventory and System Boundaries

critical

Whether the organization knows where AI exists, what systems use models, what data they touch, what tools they can call, and who owns them.

6 questions
3 capabilities
4 framework refs

AI System Registry - A living inventory of AI features, products, internal tools, agents, RAG systems, model providers, and deployment states.

You cannot secure, govern, test, monitor, or explain AI systems that are not inventoried. Inventory is the control that makes every later control possible.

Threat Modeling

AI Product Threat Modeling

critical

Whether AI-specific threat models exist before launch and evolve as models, prompts, retrieval, tools, agents, users, and abuse paths change.

6 questions
3 capabilities
4 framework refs

AI Abuse Case Modeling - Threat models include adversarial users, malicious documents, prompt injection, tool abuse, retrieval poisoning, identity confusion, and business abuse.

AI systems turn product behavior, data flows, prompts, context, and tool calls into security boundaries. Static launch review is not enough.

Prompt Injection

Prompt Injection and Context Manipulation

critical

Whether prompts, retrieved context, user content, browser pages, documents, emails, and tool instructions are treated as potentially hostile input.

6 questions
3 capabilities
3 framework refs

Untrusted Input Boundaries - User content, retrieved documents, web pages, emails, tickets, code, and third-party context are explicitly treated as untrusted instruction-bearing input.

Prompt injection is not a clever demo trick. It is the input-validation and confused-deputy problem of LLM application security.

Agentic Permissions

Agentic Permissions and Tool Safety

critical

Whether agents are scoped, permissioned, approved, monitored, interrupted, and prevented from turning delegated tasks into excessive agency.

6 questions
3 capabilities
3 framework refs

Agent Permission Model - Agents have explicit permissions, scopes, identities, tool grants, action limits, and revocation paths.

An agent with tools is a product actor. Its permissions, identity, approvals, blast radius, and audit trail must be engineered like any other privileged service path.

RAG Authorization

RAG, Data Access, and Authorization

critical

Whether retrieval respects the same authorization, tenancy, provenance, deletion, retention, and confidentiality boundaries as source systems.

6 questions
3 capabilities
3 framework refs

Retrieval Authorization - Retrieved documents and chunks enforce the same user, role, tenant, object-level, and contractual authorization rules as source systems.

RAG turns search, embeddings, chunks, documents, and access control into one product-security boundary. Leakage often happens through retrieval, not generation.

Supply Chain

Model, Dataset, and AI Supply Chain Security

high

Whether model providers, model versions, datasets, fine-tunes, eval sets, dependencies, prompts, and AI infrastructure have provenance, review, and rollback.

6 questions
3 capabilities
3 framework refs

Provider and Model Governance - AI providers, model versions, deployment modes, contractual constraints, data usage terms, and rollback options are reviewed and tracked.

AI products inherit risk from providers, models, data pipelines, prompts, plugins, dependencies, and evaluation artifacts. Supply chain visibility is product risk visibility.

Evaluation

Evaluation, Testing, and Red Teaming

high

Whether AI systems are tested against realistic security, abuse, safety, privacy, and reliability failure modes before and after launch.

6 questions
3 capabilities
3 framework refs

Security Evaluation Suites - The product has repeatable tests for prompt injection, data leakage, tool abuse, insecure output handling, overreliance, harmful actions, and abuse scenarios.

Demo prompts do not prove security. AI security needs repeatable evals, adversarial tests, regression gates, and red-team findings that feed back into engineering.

Logging

Logging, Telemetry, and Forensics

critical

Whether the organization can reconstruct what an AI system saw, retrieved, decided, called, emitted, changed, blocked, escalated, and exposed.

6 questions
3 capabilities
3 framework refs

AI Interaction Traces - Prompts, context, retrieval, model responses, tool calls, approvals, outputs, and policy decisions are logged with privacy-aware controls.

The AI security question customers and executives eventually ask is simple: what happened, how do you know, and how fast can you prove it?

Incident Response

AI Incident Response and Abuse Operations

critical

Whether the organization can detect, triage, contain, investigate, communicate, and learn from AI-specific incidents and abuse patterns.

6 questions
3 capabilities
3 framework refs

AI Incident Runbooks - The organization has incident runbooks for prompt injection, data exposure, malicious tool action, retrieval poisoning, provider outage, model abuse, and harmful automation.

AI incidents include prompt injection, data exposure, unauthorized tool actions, provider failures, model abuse, retrieval poisoning, and harmful automation. Traditional IR needs AI-specific playbooks.

Governance

Governance, Policy, and Risk Acceptance

high

Whether AI governance is connected to product decisions, control ownership, launch gates, exceptions, risk acceptance, and measurable evidence.

6 questions
3 capabilities
3 framework refs

Policy and Operating Model - AI policy defines owners, decision rights, prohibited uses, required controls, risk tiers, evidence expectations, and exception paths.

AI governance that cannot change a backlog, block a launch, approve an exception, or produce evidence is theater.

SDLC

Secure SDLC and Developer Enablement

high

Whether teams have paved paths, templates, design patterns, middleware, tests, checklists, and secure defaults for building AI features.

6 questions
3 capabilities
3 framework refs

Secure Templates and Patterns - Developers have approved implementation patterns for prompts, retrieval, tool calls, logging, model providers, permissions, and safety controls.

The scalable version of AI security is not more meetings. It is secure-by-default paths that developers can actually use.

Customer Trust

Customer Trust, Evidence, and Sales Enablement

medium

Whether AI security work produces customer-ready evidence, assurance artifacts, security answers, executive narratives, and trust materials.

6 questions
3 capabilities
3 framework refs

Customer Evidence Pack - The organization can provide clear AI security evidence to customers without scrambling across engineering, legal, product, and security.

Enterprise customers do not only ask whether AI is secure. They ask what evidence exists, who owns it, how fresh it is, and whether it survives scrutiny.

Highest question volume

AI Inventory and System Boundaries

6 questions

Whether the organization knows where AI exists, what systems use models, what data they touch, what tools they can call, and who owns them.

Lead capability: AI System Registry

Highest question volume

AI Product Threat Modeling

6 questions

Whether AI-specific threat models exist before launch and evolve as models, prompts, retrieval, tools, agents, users, and abuse paths change.

Lead capability: AI Abuse Case Modeling

Highest question volume

Prompt Injection and Context Manipulation

6 questions

Whether prompts, retrieved context, user content, browser pages, documents, emails, and tool instructions are treated as potentially hostile input.

Lead capability: Untrusted Input Boundaries

Highest question volume

Agentic Permissions and Tool Safety

6 questions

Whether agents are scoped, permissioned, approved, monitored, interrupted, and prevented from turning delegated tasks into excessive agency.

Lead capability: Agent Permission Model

Highest question volume

RAG, Data Access, and Authorization

6 questions

Whether retrieval respects the same authorization, tenancy, provenance, deletion, retention, and confidentiality boundaries as source systems.

Lead capability: Retrieval Authorization

Highest question volume

Model, Dataset, and AI Supply Chain Security

6 questions

Whether model providers, model versions, datasets, fine-tunes, eval sets, dependencies, prompts, and AI infrastructure have provenance, review, and rollback.

Lead capability: Provider and Model Governance

Roadmap

The scorecard should point to work, owners, and evidence

Actions are ordered by timeframe, severity, impact, and domain risk.

Timeframe

First 7 days

2 actions

Product Security with Engineering and Product

Create the AI system inventory

First 7 days

Create a living inventory of AI systems, features, internal tools, agents, RAG systems, providers, models, deployment states, owners, data touched, and risk tiers.

critical
medium
high
  • - All known production AI systems are listed
  • - Each system has business, engineering, and security owner fields
  • - Each system identifies provider/model, data touched, deployment state, and customer exposure

Evidence produced: AI system inventory, Model/provider list, Owner map

AI Governance Lead with Product Security

Assign owners and risk tiers to material AI systems

First 7 days

Assign business, engineering, security, privacy, and operational owners to material AI systems and classify each by business criticality and risk tier.

high
small
high
  • - All material AI systems have named owners
  • - Risk tier rubric is defined
  • - Critical systems are flagged for deeper review

Evidence produced: RACI, Risk tiering rubric, Criticality matrix

Timeframe

First 30 days

10 actions

Incident Response Lead with Product Security

Create AI-specific incident runbooks

First 30 days

Create runbooks for prompt injection, data exposure, malicious tool action, retrieval poisoning, provider outage, model abuse, and harmful automation.

critical
medium
high
  • - Top AI incident types have runbooks
  • - Runbooks include evidence queries
  • - Runbooks include containment steps

Evidence produced: AI incident runbook, Severity matrix, Escalation tree

AI Platform Engineering with Product Security

Create the agent permission model

First 30 days

Map agents to users, tenants, tools, scopes, approvals, limits, prohibited actions, identity context, and revocation paths.

critical
medium
high
  • - Agent permission matrix exists
  • - Tools are classified by risk
  • - Permissions are scoped by user, tenant, role, and purpose

Evidence produced: Agent permission matrix, Tool scopes, Identity propagation design

Security Leadership with AI Governance

Create the AI security operating model

First 30 days

Define owners, decision rights, prohibited uses, required controls, risk tiers, launch gates, exception paths, and evidence expectations for AI products.

critical
medium
high
  • - AI product security standard exists
  • - Risk tiers and required controls are defined
  • - Exception path is defined

Evidence produced: AI policy, Operating model, Governance charter

Security Engineering with Platform Engineering

Define AI interaction trace schema

First 30 days

Define privacy-aware logs for prompts, retrieved context, model responses, tool calls, approvals, policy decisions, outputs, errors, and downstream effects.

critical
medium
high
  • - Trace schema is defined
  • - Sensitive fields have redaction or minimization strategy
  • - Trace includes correlation ids

Evidence produced: Interaction trace schema, Sample logs, Privacy redaction design

AI Platform Engineering with Product Security

Define prompt and context trust boundaries

First 30 days

Define trusted instruction layers, untrusted context zones, retrieved content handling, output handling, and reusable safe prompt patterns.

critical
medium
high
  • - System, developer, user, retrieved, and tool instructions are distinguished
  • - Untrusted context handling pattern is documented
  • - Highest-risk AI flows are updated

Evidence produced: Prompt architecture, Context trust labeling, Injection test cases

Security Architecture with Product Engineering

Map AI data flows and trust boundaries

First 30 days

Document prompts, user data, retrieved context, model calls, tool calls, outputs, logs, and downstream systems for the highest-risk AI products.

critical
medium
high
  • - Top AI systems have current architecture and data-flow diagrams
  • - Trust boundaries are labeled
  • - Data classes are identified

Evidence produced: AI architecture diagram, Data flow diagram, Trust boundary diagram

Product Security

Run AI-specific threat models for the highest-risk systems

First 30 days

Threat model prompt injection, malicious context, RAG abuse, tool abuse, identity confusion, tenant leakage, business misuse, provider risk, and incident evidence paths.

critical
medium
high
  • - Highest-risk AI system has an AI-specific threat model
  • - Threat model includes adversarial context, tools, RAG, identity, and data exposure
  • - Findings are converted into backlog, evals, telemetry, or risk acceptance

Evidence produced: AI threat model, Abuse case library, Security backlog

Product Security with Search/RAG Engineering

Test RAG authorization and tenant isolation

First 30 days

Verify that retrieval enforces user, role, tenant, object-level, contractual, and source-system authorization rules.

critical
medium
high
  • - Negative tests exist for unauthorized retrieval
  • - Tenant isolation tests exist
  • - ACL propagation is documented

Evidence produced: Retrieval authorization tests, Tenant isolation tests, ACL propagation design

Product Security with Product Management

Connect AI threat model findings to backlog and risk acceptance

First 30 days

Require AI threat model findings to produce mitigation tickets, eval cases, telemetry requirements, launch decisions, or explicit risk acceptances.

high
small
high
  • - Every high finding has a ticket, control, or risk acceptance
  • - Risk acceptance includes owner, expiry, compensating control, and decision rationale
  • - Threat model output is visible in roadmap review

Evidence produced: Security backlog, Risk acceptance records, Launch decision log

Security Architecture with Procurement and Legal

Create model and provider governance registry

First 30 days

Track AI providers, models, versions, data usage terms, deployment modes, logging behavior, availability obligations, and rollback paths.

high
medium
high
  • - Approved model/provider list exists
  • - Provider data use and retention terms are documented
  • - Model versions and usage locations are tracked

Evidence produced: Model registry, Provider review, Approved model list

Remediation actions trigger when related question score is less than or equal to action.trigger.maxQuestionScore.
Question actions
Actions may also trigger when related domain score is less than or equal to maxDomainScore.
Domain actions
Actions with context flags only trigger when the assessment context matches the required flags.
Context flags
Actions are ordered by timeframe, severity, impact, and domain risk.
Quality bar

Methodology

Public-safe by design

The scorecard carries the caveats and operating principles forward so the public surface stays useful without overstating what the score can prove.

Scope

B2B SaaS products using generative AI
AI-native products
RAG-enabled systems
Agentic systems with tool use or external actions
Internal enterprise AI assistants
Customer-facing AI features requiring security assurance
Security programs preparing AI evidence for enterprise buyers, auditors, boards, or incident response

Not in scope

Formal certification
Formal audit opinion
Replacement for legal review
Replacement for privacy impact assessment
Replacement for model safety evaluation in regulated high-risk AI systems

Reporting principles

Never present the score as certification.
Always show cap rules that reduced the score.
Always distinguish raw score from capped score.
Always show evidence gaps separately from score gaps.
Always include remediation actions with acceptance criteria and evidence produced.
Always show framework mappings as crosswalk, not compliance proof.

Sample Executive Summary

The assessed product is operating at Managed scorecard level overall, but the capped score indicates that foundational evidence gaps remain. The strongest domains are secure SDLC enablement, provider governance, and evaluation coverage. The highest-risk domains are RAG authorization, logging and telemetry, and AI incident response. The next 90 days should focus on proving retrieval authorization, creating incident-ready AI traces, testing containment paths, and turning governance decisions into owner-backed evidence.

The raw score should not be used alone. Cap rules identify where foundational weaknesses prevent a higher score claim.
The biggest gap is not policy. It is time to evidence.
The roadmap converts challenge findings into implementation work with acceptance criteria.
The framework crosswalk supports assurance conversations but does not claim certification.
The next score jump requires better runtime evidence, not more static documents.

Alignment

owasp-aima

OWASP AI Maturity Assessment

Lifecycle maturity framing across strategy, design, implementation, operations, and governance.

csa-aismm

CSA AI Security Scorecard

Enterprise AI security program maturity, operating domains, and security governance framing.

nist-ai-rmf

NIST AI Risk Management Framework

Govern, Map, Measure, Manage executive crosswalk and risk management vocabulary.

iso-42001

ISO/IEC 42001

AI management system evidence, governance, operating rhythm, improvement cycle, and audit-readiness vocabulary.

owasp-ai-exchange

OWASP AI Security and Privacy Guide

Security and privacy control references across AI systems.

owasp-llm-top-10

OWASP Top 10 for LLM Applications

LLM application risk categories including prompt injection, insecure output handling, supply chain, sensitive disclosure, excessive agency, and overreliance.

mitre-atlas

MITRE ATLAS

Adversary technique vocabulary for AI-enabled systems and ML attack surfaces.

Lite questions
36

Fast signal.

Standard questions
72

Default mode.

Deep questions
132

Workshop depth.

Score bands
6

Six-band scale.