A Tale of Four Fuzzers

Postedabout 1 month agoActiveabout 1 month ago

jorangreef

57 points

16 comments

tigerbeetle.comTech Discussionstory

informativeneutral

Debate

20/100

FuzzingSecurity TestingSoftware Performance

Key topics

Fuzzing

Security Testing

Software Performance

Regulars are buzzing about a blog post titled "A Tale of Four Fuzzers," with commenters diving into the nuances of random sampling versus exhaustive enumeration. Some, like pfdietz and matklad, riff on the idea that a function generating random objects can, in theory, enumerate all objects, with pfdietz arguing that random sampling simplifies this process. However, others, such as AlotOfReading, counter that exhaustive exploration is more efficient and doesn't incur logarithmic overhead. Meanwhile, a minor sidebar discussion erupts over whether the blog's CSS is broken, with captainhorst pointing out that the site's use of CSS nesting requires a relatively modern browser.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

52m

Peak period

3-4h

Avg / period

1.8

Comment distribution16 data points

Loading chart...

Based on 16 loaded comments

Key moments

01Story posted
Nov 28, 2025 at 7:11 AM EST
about 1 month ago
Step 01
02First comment
Nov 28, 2025 at 8:03 AM EST
52m after posting
Step 02
03Peak activity
4 comments in 3-4h
Hottest window of the conversation
Step 03
04Latest activity
Nov 28, 2025 at 3:42 PM EST
about 1 month ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (16 comments)

Showing 16 comments

pfdietz

about 1 month ago

2 replies

> If you wrote a function that takes a PRNG and generates a random object, you already have a function capable of enumerating all objects.

More specifically: if you uniformly sample from a space of size N, then in O(N log N) tries you can expect to sample every point in the space. There's a logarithmic cost to this random sampling, but that's not too bad.

matklad

about 1 month ago

1 reply

It is much better than this. You can _directly_ enumerate all the objects, without any probabilities involved. There's nothing about probabilities in the interface of a PRNG, it's just non-determinism!

You could _implement_ non-determinism via probabilistic sampling, but you could also implement the same interface as exhaustive search.

pfdietz

about 1 month ago

1 reply

Well, yes. But the point is that random sampling lets you do it without thinking. Even better, it can sample over multiple spaces at the same time, and over spaces we haven't even yet formalized. "Civilization advances by extending the number of important operations which we can perform without thinking of them." (Whitehead)

An example is something like "pairwise testing" of arguments to a function. Just randomly generating values will hit all possible pairs of values to arguments, again with a logarithmic penalty.

AlotOfReading

about 1 month ago

1 reply

The point is that you can exhaustively explore the space without logarithmic overhead. There's no benefits to doing it with random sampling and it doesn't even save thought.

pfdietz

about 1 month ago

1 reply

I already explained what the benefit is. What is it with this focus on offloading work from computers to people? Let people do things more easily without thinking, even if it burns more increasingly cheap cycles.

AlotOfReading

about 1 month ago

1 reply

You haven't explained what the benefit is. There aren't "spaces we haven't formalized" because of the pigeonhole principle. There are M bits. You can generate every one of those 2^M values with any max cycle permutation.

What work is being offloaded from computers to people? It's exactly the same thing with more determinism and no logarithmic overhead.

pfdietz

about 1 month ago

2 replies

> There aren't any "spaces we haven't formalized"

Suppose that space of N points is partitioned into M relevant subsets, for now we assume of the same size. Then random sampling hits each of those subsets in O(M log M) time, even if we don't know what they are.

This sort of partitioning is long talked about in the testing literature, with the idea you should do it manually.

> what work is being offloaded

The need to write that program for explicitly enumerating the space.

matklad

about 1 month ago

Just to avoid potential confusion, the claim is that this is a function that generates a random permutation:

    pub fn shuffle(g: *Gen, T: type, slice: []T) void {
        if (slice.len <= 1) return;

        for (0..slice.len - 1) |i| {
            const j = g.range_inclusive(u64, i, slice.len - 1);
            std.mem.swap(T, &slice[i], &slice[j]);
        }
    }

And this is a function that enumerates all permutations, in order, exactly once:

    pub fn shuffle(g: *Gen, T: type, slice: []T) void {
        if (slice.len <= 1) return;

        for (0..slice.len - 1) |i| {
            const j = g.range_inclusive(u64, i, slice.len - 1);
            std.mem.swap(T, &slice[i], &slice[j]);
        }
    }

Yes, they are exactly the same function. What matters is Gen. If it looks like this

https://github.com/tigerbeetle/tigerbeetle/blob/809fe06a2ffc...

then you get a random permutation. If it rather looks like this

https://github.com/tigerbeetle/tigerbeetle/blob/809fe06a2ffc...

you enumerate all permutations.

AlotOfReading

about 1 month ago

What's being suggested also has the m log m partition behavior in the limit where N >> M. It might be easier to see why these are actually the same things with slightly different limits, imagine a huge N enumerated by an LFSR. We'll call our enumeration function rand() for tradition's sake. Now we're back to sampling.

IngoBlechschmid

about 1 month ago

1 reply

Just a tiny addition: Yes, N log N is the average time, but the distribution is heavily long-tailed, the variance is quite high, so in many instances it might take quite some time till every item has been visited (in contrast to merely most items).

The keyword to look up more details is "coupon collector's problem".

pfdietz

about 1 month ago

You can also cover every one of the points "with high probability" in O(N log N) time (meaning: the chance you missed any point is at most 1/p(N) for a polynomial p, with the constant in the big-O depending on p.)

efilife

about 1 month ago

2 replies

is the css completely fucked or am I the only one?

philipwhiuk

about 1 month ago

seems fine

captainhorst

about 1 month ago

The site uses CSS nesting which requires a browser with baseline 2023 support.

gavinhoward

about 1 month ago

The title of the blog post downplays the absolute masterclass that this post is. It should be called "A Tale of Four Fuzzers: Best Practices for Advanced Fuzzing."

And if you don't have time, just go to the bullet point list at the end; that's all of the best practices, and they are fantastic.

atn34

about 1 month ago

> If you wrote a function that takes a PRNG and generates a random object, you already have a function capable of enumerating all objects.

Something often forgotten here: if your PRNG only takes e.g. a 32-bit seed, you can generate at most 2^32 unique objects. Which you might chew through in seconds of fuzzing.

Edit: this is addressed later in the article/in a reference where they talk about using an exhaustive implementation of a PRNG interface. Neat!

View full discussion on Hacker News

ID: 46077964Type: storyLast synced: 11/28/2025, 7:10:21 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN