Beaver: an Efficient Deterministic LLM Verifier

Posted17 days ago

tshanmu

1 points

1 comments

arxiv.orgResearchstory

informativeneutral

Debate

20/100

Moe ModelsDeterministic AlgorithmsAI Research

Key topics

Moe Models

Deterministic Algorithms

AI Research

Discussion Activity

Light discussion

First comment

N/A

Peak period

Start

Avg / period

Key moments

01Story posted
Dec 16, 2025 at 10:33 AM EST
17 days ago
Step 01
02First comment
Dec 16, 2025 at 10:33 AM EST
0s after posting
Step 02
03Peak activity
1 comments in Start
Hottest window of the conversation
Step 03
04Latest activity
Dec 16, 2025 at 10:33 AM EST
17 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

tshanmuAuthor

17 days ago

As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints. While sampling-based estimates provide an intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM constraint satisfaction. Given any prefix-closed semantic constraint, BEAVER systematically explores the generation space using novel token trie and frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on correctness verification, privacy verification and secure code generation tasks across multiple state of the art LLMs. BEAVER achieves 6 to 8 times tighter probability bounds and identifies 3 to 4 times more high risk instances compared to baseline methods under identical computational budgets, enabling precise characterization and risk assessment that loose bounds or empirical evaluation cannot provide.

View full discussion on Hacker News

ID: 46289799Type: storyLast synced: 12/16/2025, 3:35:20 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN