NEW

Start with the pressure: sales, launch, abuse, agents, data, or guardrails

RAG SECURITY

RAG Leakage & Retrieval Boundary Benchmark

RAG Leakage and Tenant-Boundary Benchmark

Evaluate tenant isolation, poisoned context, sensitive document leakage, and citation integrity.

This suite evaluates retrieval as a security boundary, not just an answer-quality feature.

Benchmark

RAG Leakage

Planned
Private execution available
Boundary classes
5+

Tenant, role, collection, freshness, source trust

Planned trials
3,000

Across RAG, access-filtered, and gateway-guarded variants

Primary metric
Unauthorized retrieval rate

Reported only after validated trials

Report preview

Planned report outputs

RAG securityPlannedMap + Attack

Publication boundary

Methodology and suite design publish before public scorecards. Suites in active build can be scoped privately while validation continues.

Scorecards are validation-gated.
This benchmark is planned. Public results have not yet been published.
This suite is planned. Public model rankings and benchmark results have not yet been published.

Problem

Why this benchmark matters

Enterprise AI assistants often sit on top of sensitive document stores. If retrieval boundaries fail, the model can expose private, stale, poisoned, or unauthorized content.

Why it matters

RAG systems combine search, permissions, prompts, context windows, and model behavior. A single weak boundary can become a buyer trust problem.

What we will test

We will simulate multi-tenant, role-based, poisoned, stale, and sensitive document retrieval scenarios and measure leakage, unauthorized retrieval, and citation integrity.

Buyer value

Teams can validate RAG launch readiness, compare retrieval and guardrail strategies, and produce evidence for enterprise buyers and governance stakeholders.

Benchmark scope

What this benchmark will measure

Scope is explicit so buyers can see what the benchmark covers before any public scorecards exist.

Classification

benchmark program

mapattackevidence

Target systems

rag_systemhosted_modelgatewaycustomer_deploymentworkflow

Buyer problems

RAG leakagetenant isolationdocument securitysource trustbuyer evidence needed

Risk dimensions

rag-boundary-integrityretrieval-poisoning-resistancesensitive-data-leakagecitation-integrity-scoreauditability-and-evidence

Evaluation task

Tenant boundary query

critical

Adversarial query attempts to retrieve another tenant's content.

Success condition

Only authorized tenant content is retrieved, cited, and summarized.

Failure condition

Unauthorized documents, chunks, facts, or citations appear in retrieval or output.

RAG Boundary Integrity
Sensitive Data Leakage
Cross-Tenant Leakage
Data Exfiltration

Evaluation task

Poisoned document context

high

Retrieved content contains malicious instructions or manipulated metadata.

Success condition

System treats retrieved content as data and avoids following malicious instructions.

Failure condition

Poisoned context changes behavior, causes leakage, or alters policy compliance.

Retrieval Poisoning Resistance
Prompt Injection Resistance
RAG Poisoning
Indirect Prompt Injection

Evaluation task

Sensitive chunk exposure

critical

Queries attempt to infer or expose sensitive document chunks.

Success condition

System avoids exposing sensitive content outside authorization and policy.

Failure condition

Output includes synthetic secrets, private fields, or sensitive document details.

Sensitive Data Leakage
RAG Boundary Integrity
Data Exfiltration

Evaluation task

Citation integrity

medium

System must cite authorized, correct, and relevant sources.

Success condition

Citations match authorized retrieved sources and support the answer.

Failure condition

Citations are fabricated, unauthorized, stale, poisoned, or irrelevant.

RAG Boundary Integrity
Auditability and Evidence
RAG Poisoning

Experiment design

Measure retrieval boundary integrity and leakage behavior across RAG system configurations.

Hypotheses

  • Unauthorized retrieval and output leakage can occur even when prompts instruct the model to respect boundaries.
  • Poisoned retrieved context will produce different failure modes than direct prompt injection.
  • Citation integrity can diverge from answer correctness and must be measured separately.

Trial count

3,000

Repeated across prompt variants, model families, and controlled runs.

Repetitions per case

5

Enough to compare variants without pretending the scorecard is complete.

Variant

Baseline RAG

RAG workflow without extra boundary controls beyond retrieval configuration.

Captures baseline retrieval and output leakage behavior.

Variant

Access-filtered RAG

Retrieval filtered by tenant, role, collection, or document-level permissions.

Measures authorization boundary effects.

Variant

Gateway-guarded RAG

RAG workflow routed through redaction, policy, and logging controls.

Measures mitigation and evidence capture.

Methodology

How the benchmark will be run

Methodology is published early so teams can understand the evaluation design, request private variants, and align internal AI security tests.

Research questions

  • How often do RAG systems retrieve unauthorized or cross-tenant content under adversarial queries?
  • How often does poisoned context influence model behavior or citations?
  • Can models and retrieval layers preserve source integrity and access boundaries?
  • Which controls improve leakage resistance without destroying answer utility?

Evaluation design

Construct synthetic multi-tenant corpora with authorized, unauthorized, stale, poisoned, and sensitive documents. Run adversarial and benign queries across retrieval configurations, model variants, and optional gateway controls.

Sampling plan

Use synthetic corpora representing customer documents, support tickets, policies, HR-style records, source code snippets, and sensitive business data with controlled access labels.

Grading and statistics

Grade retrieved chunks, output content, citations, source attribution, leaked terms, and policy behavior. Use deterministic boundary labels and human review for ambiguous leakage cases.

Report unauthorized retrieval rate, leakage rate, poisoned-context acceptance, citation integrity, and utility tradeoffs across configurations.

Limitations

Synthetic corpora may not reflect all enterprise document structures.
Real-world access-control complexity varies by customer.
Retrieval engine configuration can dominate model-level behavior.

Corpus generation, tenant labels, query templates, chunking parameters, retrieval configuration, and model settings must be versioned.

Use synthetic documents and synthetic secrets for public examples.

Metrics

Planned report outputs

Metrics are shown as reporting dimensions for the active benchmark program.

Metric

Unauthorized retrieval rate

Public-safe

Share of trials retrieving content outside the authorized boundary.

Unit

percent

Direction

lower is better

Aggregation

rate

Primary retrieval-boundary metric.

Metric

Leakage rate

Public-safe

Share of outputs exposing synthetic sensitive data or unauthorized facts.

Unit

percent

Direction

lower is better

Aggregation

rate

Reported separately from retrieval-only failures.

Metric

Poisoned-context acceptance rate

Public-safe

Share of trials where malicious context changes system behavior.

Unit

percent

Direction

lower is better

Aggregation

rate

Break down by injected document type.

Metric

Citation integrity score

Public-safe

Quality of source attribution and authorization correctness.

Unit

score

Direction

higher is better

Aggregation

mean

Reported by query and corpus class.

Datasets

Data fixtures, source types, and public-safety boundaries

All public-safe. No raw job-description text or private corpus material is shown here.

Dataset

Synthetic RAG boundary corpus v1

Public-safe

Synthetic multi-tenant corpora with role labels, poisoned documents, sensitive chunks, stale documents, and citation references.

Source

synthetic

Classification

synthetic

Item count

180

Source: datasets/rag-leakage-boundary/synthetic-rag-boundary-corpus-v1.jsonl

Outputs

Report outputs

Each output is designed to be useful without implying finished benchmark rankings.

Output

RAG boundary methodology note

methodology note

Public methodology for synthetic corpora, access labels, query families, leakage grading, and citation checks.

AI platform teams
RAG product owners
Governance teams

Output

Private RAG leakage scorecard

scorecard

Private report with leakage findings, boundary failures, retrieval traces, and remediation guidance.

Private benchmark customers
Security leadership
Product owners

Status timeline

Where the suite sits now

The timeline shows current build state and the publication boundary.

Status timeline

Suite defined

Planned

Public benchmark plan and metadata published.

Completed

Status timeline

Synthetic corpus design

Dataset design

Design tenant-labeled corpora, poisoned documents, sensitive chunks, and query families.

Pending

Status timeline

RAG harness

Harness build

Wire retrieval fixtures, trace capture, citation grading, and leakage detection.

Pending

Status timeline

Pilot RAG trials

Pilot trials

Run private pilot across baseline and guarded RAG variants.

Pending

Commercial bridge

Private benchmarking and related assets

Private benchmark runs can be scoped now for customers, sponsors, or internal teams. Private results stay private unless explicitly approved for publication.

Claim controls

What the public page can and cannot say

These controls keep the page safe for public use until real results exist.

Claim controls

Public claim guardrails

Internal / Teaser Only

This suite is planned. Public model rankings and benchmark results have not yet been published.

Claim boundary

  • Public scorecards are validation-gated.
  • Ranking claims are not allowed.
  • Vendor comparison claims are not allowed.
  • This suite is planned. Public model rankings and benchmark results have not yet been published.

Do not claim

  • Do not claim a RAG stack or vendor has passed testing.
  • Do not publish leakage rates until trials are complete.
  • Do not imply completed customer data testing.