Verifiable ML Without Determinism: Tolerance-Aware Optimistic Verification

Posted3 months ago

cnyaojz

1 points

1 comments

arxiv.orgResearchstory

calmneutral

Debate

0/100

Machine LearningVerificationDeterminism

Key topics

Machine Learning

Verification

Determinism

Researchers propose a new method for verifiable machine learning without determinism, sparking interest in the HN community, albeit with limited discussion.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

N/A

Peak period

Start

Avg / period

Key moments

01Story posted
Oct 21, 2025 at 9:27 AM EDT
3 months ago
Step 01
02First comment
Oct 21, 2025 at 9:27 AM EDT
0s after posting
Step 02
03Peak activity
1 comments in Start
Hottest window of the conversation
Step 03
04Latest activity
Oct 21, 2025 at 9:27 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

cnyaojzAuthor

3 months ago

Modern ML is mostly run on other people's hardware (clouds, inference marketplaces, heterogeneous GPU fleets). Today, you can't verify what actually ran: which model, what inputs, or whether your result was silently downgraded (model swap, early exit, quantization, ad-embedding tweaks). The usual fix (recompute and compare) breaks because GPUs are nondeterministic: IEEE-754 is non-associative, kernels reorder reductions, and thread schedules/atomics vary. Bitwise replay at scale is brittle and costly.

We built a system that makes ML results verifiable without requiring determinism. Instead of demanding exact equality, we verify outputs up to principled, per-operator error bounds and resolve disagreements with an optimistic, Merkle-anchored dispute game.

What's inside (precise + concise):

Semantics: "Tolerance-aware correctness" for tensor programs and operator-specific acceptance regions induced by IEEE-754 rounding.

Two error models:

Theoretical IEEE-754 bounds (sound, per-operator, element-wise; conservative but cheap to verify).

Empirical percentile thresholds calibrated across GPUs (tight, model-specific).

Dispute protocol: If a result is challenged, we recursively partition the traced graph (Merkle proofs) until a single operator. At the leaf, we either (i) certify with the theoretical bound or (ii) run a small committee vote against the empirical threshold. Violations are slashed.

Runtime + coordinator: PyTorch-compatible runtime that traces graphs, computes bounds on the fly, records subgraph I/O, and emits/validates commitments; a lightweight coordination layer (we prototyped on Ethereum for authenticated logs/bonds, but the protocol doesn't depend on a blockchain). Overhead ~0.3% latency on Qwen3-8B; no extra memory beyond native subgraph execution.

Scope: Designed for the open-model setting (public weights/graphs enable subgraph extraction and thresholding). Closed-model APIs can still expose commitments to permissioned verifiers. No TEEs required; no deterministic kernels needed.

We’d love feedback on blind spots in the threat model, better partition policies for the dispute game, and where empirical thresholds might be gamed or leak information.

View full discussion on Hacker News

ID: 45655524Type: storyLast synced: 11/17/2025, 9:09:06 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN