DRIFTBENCH · AI DECISION AUDIT

DriftBench is the AI TÜV for decision systems.
Can your system defend its last fraud decision?

Models deployed via OpenAI/Azure, Anthropic, Google, Bedrock — and via enterprise integrators (e.g., QuantumBlack, Palantir) — already make fraud and risk decisions. When an auditor asks: “Which policy governed this decision?” many systems cannot answer.

Example: A model approves a €4,800 transfer to a high-risk country — but cannot reference the governing policy.

This decision cannot be audited.

Run a decision audit View benchmark results

DriftBench Fraud Detection Track v0.1 · 5 sealed tasks · policy binding · witness tuples · fail-closed · scope gates

Decision Evidence

These decisions cannot be audited.

Each failure below is a sealed evaluation task. The system made a decision but could not cite the governing policy.

Case Study

Nightbank: When systems remain compliant but lose the mandate

A recurring failure mode in automated risk systems: the system stays inside its own rule set, while silently drifting away from the purpose it was approved for.

NIGHTBANK · MANDATE DRIFT INCIDENT (ANONYMIZED)

The fraud system continued to satisfy its internal checks and KPIs, yet the board-level mandate (reduce high-risk exposure) was no longer respected in edge cases under business pressure. The system produced “valid” decisions that could not be defended against the original risk purpose.

Observed state

Rule compliance (C_k) = 1
Mandate integrity (I_k) = 0

Audit implication

The system can “pass controls” while producing decisions that cannot be defended under the original mandate.

DriftBench is designed to detect exactly this: decision drift without visible control failure.

Live Decision Audit

Test one decision from your own system.

Paste an anonymized decision record. The audit runs entirely in your browser — no data is uploaded.

Decision record (JSON)

Local-only.

Audit Result

Paste a decision and click Run.

Audit Method

DASR: How a decision audit actually runs

DriftBench does not inspect model internals. It audits decisions as traceable governance objects: input, policy, witness, scope, fail-closed, and audit trace.

1 · COLLECT

Collect recent automated decisions (anonymized) + the policy artifacts they claim to follow.

2 · BIND

Bind each decision to its governing policy via witness tuples (policy_id + section). Missing witness = not auditable.

3 · SCOPE

Verify authorization: the system must not decide outside its mandated scope (e.g., business exposure > allowed threshold).

4 · FAIL-CLOSED

Under uncertainty, conflict, or missing metadata: stop and escalate. Approving under uncertainty is a governance failure.

5 · STRESS

Re-run sealed variants (schema shift, adversarial injection, conflicting goals) to expose drift before production incidents.

6 · REPORT

Produce an audit record: violations, invariant failures, witness gaps, scope breaches, and recommended fail-closed gates.

What auditors care about

Not “is the model smart?” — but can the institution defend a decision six months later with policy + witness + scope + escalation history.

Technical Benchmark Results

Full evaluation detail

Sealed-track evaluation. Tasks withheld from model providers prior to testing.

Model	Explainable	Witness Rate	Fail-Closed	Drift (BDI)	Boundary Risk

Why this matters

Black-box decisions have systemic consequences

Governance failure is not hypothetical. When decisions cannot be traced to policy, institutions lose auditability first — and control later.

2008 · Financial Crisis

Model outputs were treated as decisions without audit-ready rationale. When challenged, institutions could not reconstruct individual decision grounds. Systemic risk stayed invisible until it was too late.

2012 · Knight Capital

An automated trading system operated outside its authorized scope for ~45 minutes. Dashboards stayed green. $440M loss before intervention. Missing scope gates and missing fail-closed.

2024 · EU AI Act

Traceability, human oversight, and risk management are no longer “nice to have”. Fraud decisions are high-risk by default: auditability and escalation are mandatory.

DriftBench is built to expose audit-breaks early: missing witness tuples, scope breaches, and fail-closed violations before a regulator finds them.

Evaluate your own system

Run DriftBench on your decision system.

DriftBench is designed for bank-grade governance audits: policy binding, witness tuples, scope gates, and fail-closed behavior. You can run it internally (no data leaves your environment) or request an independent audit report.

Option A

Run internally

Download the benchmark and run it against your own model endpoint. Results stay with you.

driftbench run --model your_endpoint

Option B

Governance audit

We run DriftBench against your system and deliver an audit report: violations, drift signatures, and recommended fail-closed gates.

Typical deliverable: Decision Integrity Report (PDF) + machine-readable JSON findings.

Option C

Pilot program

Integrate GCCL monitoring into your model governance workflow: continuous decision auditing, scope controls, and escalation policies.

Target: audit-ready decisions under EU AI Act / internal model risk governance.

Contact

audit@snapos.org

Send a short note with: decision domain (fraud / credit / AML), deployment type (API / self-hosted), and whether you want an internal run or an independent audit report.
No sales process. We respond within 24 hours.

GCCL v0.1 Spec: doi:10.5281/zenodo.18362037
Data handling: Local-only auditor · No uploads · No storage

DriftBench is the AI TÜV for decision systems. Can your system defend its last fraud decision?