aisecurity.llc
AIPSA Scorecard — AI Product Security Scorecard
A product-security baseline for AI systems that ship, log, and need evidence.
AI Product Security Scorecard
See whether your AI security program is real, repeatable, and evidence-producing.
A product baseline for AI-enabled SaaS, RAG systems, agents, tool-calling workflows, and enterprise AI products that need evidence, not theater.
Product snapshot
AI Product Security Scorecard
A product-security scorecard for assessing whether AI-enabled systems are inventoried, threat-modeled, permissioned, tested, logged, governed, and supported by evidence.
Board-style summary
The program has moved beyond ad hoc AI security, but it is not yet fully measured. The priority is to reduce time-to-evidence across retrieval, tool use, agent actions, and incident response. Management should fund the 90-day remediation plan and require a reassessment after evidence gaps are closed.
Overview
A product-grade scorecard with a clear path to action
This surface keeps the useful content, the scoring vocabulary, and the remediation shape. It avoids importing the prototype's full app shell, export stack, or local-storage history into the main repo.
Challenge coverage areas.
Operational skills and controls.
The main challenge signal.
Remediation items with owners.
What it is for
Product-security first
The scorecard focuses on AI features as shipped product systems: prompts, retrieval, tools, agents, permissions, logs, incidents, and customer evidence.
Evidence-oriented
Every score asks what can be proven: diagrams, evals, logs, runbooks, risk decisions, owner records, and customer-ready artifacts.
Roadmap-ready
Low scores generate a practical 30/60/90 day remediation plan with owners, acceptance criteria, and evidence produced.
Framework-aware
The scorecard crosswalks to NIST AI RMF, OWASP AI maturity work, CSA AI security maturity, ISO 42001, OWASP LLM Top 10, and MITRE ATLAS without pretending to certify compliance.
Default caveat
This is a product baseline. It helps teams identify gaps, prioritize remediation, and prepare evidence. It is not a certification and does not assert compliance by itself.
Packages
Assessment packages and score bands
This scorecard keeps the depth deliberate: Lite for fast signal, Standard for most teams, Deep for workshops and paid engagements.
lite
Lite
A fast executive scan for lead generation, first-pass discovery, or a quick posture conversation.
Scorecard, top gaps, first roadmap, and high-level framework crosswalk.
standard
Standard
A serious company baseline for product, security, governance, and engineering leaders.
Domain heatmap, capability scores, prioritized gaps, evidence requests, roadmap, and framework crosswalk.
deep
Deep
A comprehensive evidence-oriented challenge for workshops, paid engagements, and AI product security program design.
Full maturity report, evidence gap analysis, control roadmap, framework crosswalk, and implementation backlog.
Level 0
Unseen
AI use exists, but the organization cannot reliably inventory, govern, test, or observe it.
Evidence is absent, anecdotal, or unavailable during a customer, incident, or executive review.
Discovery happens after incidents, customer questions, or urgent launch pressure.
Level 1
Ad Hoc
Some AI security activity exists, usually hero-driven, inconsistent, and detached from normal product delivery.
Evidence exists in scattered documents, tickets, Slack threads, screenshots, or individual memory.
Security review depends on who happens to be involved.
Level 2
Repeatable
Core practices exist for important systems, but coverage is incomplete and evidence is still mostly manual.
Evidence can be assembled, but not quickly, consistently, or across the whole AI surface.
Teams have templates and review rituals, but adoption varies by product area.
Level 3
Managed
AI product security controls are embedded into SDLC, architecture review, launch review, logging, and risk acceptance.
Evidence is attached to the way products are built and launched.
AI security is on rails for most production launches.
Level 4
Measured
Controls produce measurable evidence, test results, coverage metrics, and executive-ready reporting.
Evidence is queryable, repeatable, and tied to metrics, owners, systems, and risk decisions.
AI product security posture is visible across domains and improves through measurement.
Level 5
Adaptive
AI security is continuously monitored, red-teamed, updated, and improved as models, tools, agents, data, and abuse patterns change.
Evidence is continuous, regression-tested, incident-informed, and customer-ready.
Controls evolve with adversary behavior, architecture changes, and business expansion.
Scoring rules
How the scorecard keeps claims bounded
Unseen
Ad Hoc
Repeatable
Managed
Measured
Adaptive
Domains
Coverage areas, not a pile of isolated questions
The scorecard is organized around product-security domains, each with capabilities and question volume behind it. The cards below show the shape of the product without exposing the full question bank here.
Inventory
AI Inventory and System Boundaries
Whether the organization knows where AI exists, what systems use models, what data they touch, what tools they can call, and who owns them.
AI System Registry - A living inventory of AI features, products, internal tools, agents, RAG systems, model providers, and deployment states.
You cannot secure, govern, test, monitor, or explain AI systems that are not inventoried. Inventory is the control that makes every later control possible.
Threat Modeling
AI Product Threat Modeling
Whether AI-specific threat models exist before launch and evolve as models, prompts, retrieval, tools, agents, users, and abuse paths change.
AI Abuse Case Modeling - Threat models include adversarial users, malicious documents, prompt injection, tool abuse, retrieval poisoning, identity confusion, and business abuse.
AI systems turn product behavior, data flows, prompts, context, and tool calls into security boundaries. Static launch review is not enough.
Prompt Injection
Prompt Injection and Context Manipulation
Whether prompts, retrieved context, user content, browser pages, documents, emails, and tool instructions are treated as potentially hostile input.
Untrusted Input Boundaries - User content, retrieved documents, web pages, emails, tickets, code, and third-party context are explicitly treated as untrusted instruction-bearing input.
Prompt injection is not a clever demo trick. It is the input-validation and confused-deputy problem of LLM application security.
Agentic Permissions
Agentic Permissions and Tool Safety
Whether agents are scoped, permissioned, approved, monitored, interrupted, and prevented from turning delegated tasks into excessive agency.
Agent Permission Model - Agents have explicit permissions, scopes, identities, tool grants, action limits, and revocation paths.
An agent with tools is a product actor. Its permissions, identity, approvals, blast radius, and audit trail must be engineered like any other privileged service path.
RAG Authorization
RAG, Data Access, and Authorization
Whether retrieval respects the same authorization, tenancy, provenance, deletion, retention, and confidentiality boundaries as source systems.
Retrieval Authorization - Retrieved documents and chunks enforce the same user, role, tenant, object-level, and contractual authorization rules as source systems.
RAG turns search, embeddings, chunks, documents, and access control into one product-security boundary. Leakage often happens through retrieval, not generation.
Supply Chain
Model, Dataset, and AI Supply Chain Security
Whether model providers, model versions, datasets, fine-tunes, eval sets, dependencies, prompts, and AI infrastructure have provenance, review, and rollback.
Provider and Model Governance - AI providers, model versions, deployment modes, contractual constraints, data usage terms, and rollback options are reviewed and tracked.
AI products inherit risk from providers, models, data pipelines, prompts, plugins, dependencies, and evaluation artifacts. Supply chain visibility is product risk visibility.
Evaluation
Evaluation, Testing, and Red Teaming
Whether AI systems are tested against realistic security, abuse, safety, privacy, and reliability failure modes before and after launch.
Security Evaluation Suites - The product has repeatable tests for prompt injection, data leakage, tool abuse, insecure output handling, overreliance, harmful actions, and abuse scenarios.
Demo prompts do not prove security. AI security needs repeatable evals, adversarial tests, regression gates, and red-team findings that feed back into engineering.
Logging
Logging, Telemetry, and Forensics
Whether the organization can reconstruct what an AI system saw, retrieved, decided, called, emitted, changed, blocked, escalated, and exposed.
AI Interaction Traces - Prompts, context, retrieval, model responses, tool calls, approvals, outputs, and policy decisions are logged with privacy-aware controls.
The AI security question customers and executives eventually ask is simple: what happened, how do you know, and how fast can you prove it?
Incident Response
AI Incident Response and Abuse Operations
Whether the organization can detect, triage, contain, investigate, communicate, and learn from AI-specific incidents and abuse patterns.
AI Incident Runbooks - The organization has incident runbooks for prompt injection, data exposure, malicious tool action, retrieval poisoning, provider outage, model abuse, and harmful automation.
AI incidents include prompt injection, data exposure, unauthorized tool actions, provider failures, model abuse, retrieval poisoning, and harmful automation. Traditional IR needs AI-specific playbooks.
Governance
Governance, Policy, and Risk Acceptance
Whether AI governance is connected to product decisions, control ownership, launch gates, exceptions, risk acceptance, and measurable evidence.
Policy and Operating Model - AI policy defines owners, decision rights, prohibited uses, required controls, risk tiers, evidence expectations, and exception paths.
AI governance that cannot change a backlog, block a launch, approve an exception, or produce evidence is theater.
SDLC
Secure SDLC and Developer Enablement
Whether teams have paved paths, templates, design patterns, middleware, tests, checklists, and secure defaults for building AI features.
Secure Templates and Patterns - Developers have approved implementation patterns for prompts, retrieval, tool calls, logging, model providers, permissions, and safety controls.
The scalable version of AI security is not more meetings. It is secure-by-default paths that developers can actually use.
Customer Trust
Customer Trust, Evidence, and Sales Enablement
Whether AI security work produces customer-ready evidence, assurance artifacts, security answers, executive narratives, and trust materials.
Customer Evidence Pack - The organization can provide clear AI security evidence to customers without scrambling across engineering, legal, product, and security.
Enterprise customers do not only ask whether AI is secure. They ask what evidence exists, who owns it, how fresh it is, and whether it survives scrutiny.
Highest question volume
AI Inventory and System Boundaries
Whether the organization knows where AI exists, what systems use models, what data they touch, what tools they can call, and who owns them.
Lead capability: AI System Registry
Highest question volume
AI Product Threat Modeling
Whether AI-specific threat models exist before launch and evolve as models, prompts, retrieval, tools, agents, users, and abuse paths change.
Lead capability: AI Abuse Case Modeling
Highest question volume
Prompt Injection and Context Manipulation
Whether prompts, retrieved context, user content, browser pages, documents, emails, and tool instructions are treated as potentially hostile input.
Lead capability: Untrusted Input Boundaries
Highest question volume
Agentic Permissions and Tool Safety
Whether agents are scoped, permissioned, approved, monitored, interrupted, and prevented from turning delegated tasks into excessive agency.
Lead capability: Agent Permission Model
Highest question volume
RAG, Data Access, and Authorization
Whether retrieval respects the same authorization, tenancy, provenance, deletion, retention, and confidentiality boundaries as source systems.
Lead capability: Retrieval Authorization
Highest question volume
Model, Dataset, and AI Supply Chain Security
Whether model providers, model versions, datasets, fine-tunes, eval sets, dependencies, prompts, and AI infrastructure have provenance, review, and rollback.
Lead capability: Provider and Model Governance
Roadmap
The scorecard should point to work, owners, and evidence
Actions are ordered by timeframe, severity, impact, and domain risk.
Timeframe
First 7 days
Product Security with Engineering and Product
Create the AI system inventory
Create a living inventory of AI systems, features, internal tools, agents, RAG systems, providers, models, deployment states, owners, data touched, and risk tiers.
- - All known production AI systems are listed
- - Each system has business, engineering, and security owner fields
- - Each system identifies provider/model, data touched, deployment state, and customer exposure
Evidence produced: AI system inventory, Model/provider list, Owner map
AI Governance Lead with Product Security
Assign owners and risk tiers to material AI systems
Assign business, engineering, security, privacy, and operational owners to material AI systems and classify each by business criticality and risk tier.
- - All material AI systems have named owners
- - Risk tier rubric is defined
- - Critical systems are flagged for deeper review
Evidence produced: RACI, Risk tiering rubric, Criticality matrix
Timeframe
First 30 days
Incident Response Lead with Product Security
Create AI-specific incident runbooks
Create runbooks for prompt injection, data exposure, malicious tool action, retrieval poisoning, provider outage, model abuse, and harmful automation.
- - Top AI incident types have runbooks
- - Runbooks include evidence queries
- - Runbooks include containment steps
Evidence produced: AI incident runbook, Severity matrix, Escalation tree
AI Platform Engineering with Product Security
Create the agent permission model
Map agents to users, tenants, tools, scopes, approvals, limits, prohibited actions, identity context, and revocation paths.
- - Agent permission matrix exists
- - Tools are classified by risk
- - Permissions are scoped by user, tenant, role, and purpose
Evidence produced: Agent permission matrix, Tool scopes, Identity propagation design
Security Leadership with AI Governance
Create the AI security operating model
Define owners, decision rights, prohibited uses, required controls, risk tiers, launch gates, exception paths, and evidence expectations for AI products.
- - AI product security standard exists
- - Risk tiers and required controls are defined
- - Exception path is defined
Evidence produced: AI policy, Operating model, Governance charter
Security Engineering with Platform Engineering
Define AI interaction trace schema
Define privacy-aware logs for prompts, retrieved context, model responses, tool calls, approvals, policy decisions, outputs, errors, and downstream effects.
- - Trace schema is defined
- - Sensitive fields have redaction or minimization strategy
- - Trace includes correlation ids
Evidence produced: Interaction trace schema, Sample logs, Privacy redaction design
AI Platform Engineering with Product Security
Define prompt and context trust boundaries
Define trusted instruction layers, untrusted context zones, retrieved content handling, output handling, and reusable safe prompt patterns.
- - System, developer, user, retrieved, and tool instructions are distinguished
- - Untrusted context handling pattern is documented
- - Highest-risk AI flows are updated
Evidence produced: Prompt architecture, Context trust labeling, Injection test cases
Security Architecture with Product Engineering
Map AI data flows and trust boundaries
Document prompts, user data, retrieved context, model calls, tool calls, outputs, logs, and downstream systems for the highest-risk AI products.
- - Top AI systems have current architecture and data-flow diagrams
- - Trust boundaries are labeled
- - Data classes are identified
Evidence produced: AI architecture diagram, Data flow diagram, Trust boundary diagram
Product Security
Run AI-specific threat models for the highest-risk systems
Threat model prompt injection, malicious context, RAG abuse, tool abuse, identity confusion, tenant leakage, business misuse, provider risk, and incident evidence paths.
- - Highest-risk AI system has an AI-specific threat model
- - Threat model includes adversarial context, tools, RAG, identity, and data exposure
- - Findings are converted into backlog, evals, telemetry, or risk acceptance
Evidence produced: AI threat model, Abuse case library, Security backlog
Product Security with Search/RAG Engineering
Test RAG authorization and tenant isolation
Verify that retrieval enforces user, role, tenant, object-level, contractual, and source-system authorization rules.
- - Negative tests exist for unauthorized retrieval
- - Tenant isolation tests exist
- - ACL propagation is documented
Evidence produced: Retrieval authorization tests, Tenant isolation tests, ACL propagation design
Product Security with Product Management
Connect AI threat model findings to backlog and risk acceptance
Require AI threat model findings to produce mitigation tickets, eval cases, telemetry requirements, launch decisions, or explicit risk acceptances.
- - Every high finding has a ticket, control, or risk acceptance
- - Risk acceptance includes owner, expiry, compensating control, and decision rationale
- - Threat model output is visible in roadmap review
Evidence produced: Security backlog, Risk acceptance records, Launch decision log
Security Architecture with Procurement and Legal
Create model and provider governance registry
Track AI providers, models, versions, data usage terms, deployment modes, logging behavior, availability obligations, and rollback paths.
- - Approved model/provider list exists
- - Provider data use and retention terms are documented
- - Model versions and usage locations are tracked
Evidence produced: Model registry, Provider review, Approved model list
Methodology
Public-safe by design
The scorecard carries the caveats and operating principles forward so the public surface stays useful without overstating what the score can prove.
Scope
Not in scope
Reporting principles
Sample Executive Summary
The assessed product is operating at Managed scorecard level overall, but the capped score indicates that foundational evidence gaps remain. The strongest domains are secure SDLC enablement, provider governance, and evaluation coverage. The highest-risk domains are RAG authorization, logging and telemetry, and AI incident response. The next 90 days should focus on proving retrieval authorization, creating incident-ready AI traces, testing containment paths, and turning governance decisions into owner-backed evidence.
Alignment
OWASP AI Maturity Assessment
Lifecycle maturity framing across strategy, design, implementation, operations, and governance.
CSA AI Security Scorecard
Enterprise AI security program maturity, operating domains, and security governance framing.
NIST AI Risk Management Framework
Govern, Map, Measure, Manage executive crosswalk and risk management vocabulary.
ISO/IEC 42001
AI management system evidence, governance, operating rhythm, improvement cycle, and audit-readiness vocabulary.
OWASP AI Security and Privacy Guide
Security and privacy control references across AI systems.
OWASP Top 10 for LLM Applications
LLM application risk categories including prompt injection, insecure output handling, supply chain, sensitive disclosure, excessive agency, and overreliance.
MITRE ATLAS
Adversary technique vocabulary for AI-enabled systems and ML attack surfaces.
Fast signal.
Default mode.
Workshop depth.
Six-band scale.