Zlib Visualizer

Posted3 months agoActive3 months ago

elisaado

319 points

32 comments

lynn.github.ioTechstory

calmpositive

Debate

40/100

Compression AlgorithmsData VisualizationZlib

Key topics

Compression Algorithms

Data Visualization

Zlib

A visualizer for zlib compression is shared, sparking discussion on its features, limitations, and potential improvements, with users providing feedback and suggestions for enhancements.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

Peak period

84-96h

Avg / period

Comment distribution32 data points

Loading chart...

Based on 32 loaded comments

Key moments

01Story posted
Sep 25, 2025 at 11:19 AM EDT
3 months ago
Step 01
02First comment
Sep 28, 2025 at 9:52 PM EDT
3d after posting
Step 02
03Peak activity
17 comments in 84-96h
Hottest window of the conversation
Step 03
04Latest activity
Sep 30, 2025 at 10:33 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (32 comments)

Showing 32 comments

PeakKS

3 months ago

1 reply

Damn I tried the bee movie script, they got me.

NooneAtAll3

3 months ago

1 reply

got how?

thewisenerd

3 months ago

3 replies

if you paste contents which contain a very particular string ("intends to sue the human race for stealing our honey"), the contents are replaced with the phrase "the bee movie script? really? how original"

Heliodex

3 months ago

2 replies

Also got got. I assume the Bee Movie script is the first choice for a lot of people needing an ad-hoc big block of text. It also compresses pretty well.

https://github.com/lynn/flateview/blob/2668beaa5cc8cae387b6f...

razster

3 months ago

1 reply

How is that a thing? Guess I shall go down the rabbit hole.

luca4

3 months ago

1 reply

Also for me the first time i hear about this. There goes the next hour.

foofoo12

3 months ago

1 reply

Please report back, as many of us don't have an hour. We rely on soldiers like you! Thanks!

luca4

3 months ago

1 reply

It's mostly what heliodex said, that it's a copypasta when ppl need big text. There's also a compression meme around the bee and other movies (on yt: bee movie in 10s or it gets faster everytime X). But unlike i thought in the beginning it's not a zlib specific joke.

rcxdude

3 months ago

Yeah, the movie became a bit of a meme at some point and somehow shoehorning in "the entire bee movie script" into random places became a part of that.

mid-kid

3 months ago

1 reply

What happened to lorem ipsum?

thewisenerd

3 months ago

not easily compressible, i guess?

ape4

3 months ago

1 reply

Um, sorry, I don't really get it. Is "the bee movie script? really? how original" a comment?

zamadatix

3 months ago

1 reply

It's the resulting string the tool gives instead of the actual compressed string info. You can see the result directly by putting some text which contains "intends to sue the human race for stealing our honey" into the input text box.

ape4

3 months ago

1 reply

Thanks. So only for this tool - not zlib normally?

zamadatix

3 months ago

Yes https://github.com/lynn/flateview/blob/2668beaa5cc8cae387b6f...

oneeyedpigeon

3 months ago

It should really normalise whitespace before that check, because the version of the script I found split the line :)

quuxplusone

3 months ago

1 reply

s/Z-Lib/zlib/

I wonder if this can be blamed on the HN title auto-shortener or not...

userbinator

3 months ago

I was expecting something about how many books they had, so this was a funny surprise. I do wonder if the naming was a deliberate attempt at hiding, much like naming a torrent tracker after the sound made by a pig.

duskwuff

3 months ago

1 reply

Two and a half issues:

1) The handling of dynamic blocks leaves something to be desired. The parameters are left mostly undecoded. It'd be really neat if the Huffman symbols were listed somewhere, rather than just being left implicit.

2) The visualization falls apart pretty badly for texts consisting of more than one block (which tends to happen around 32 KB) - symbols are still decoded, but references all show up blank.

Large inputs make the page hang for a bit, but that's probably pretty hard to avoid.

And as an enhancement: it'd be really cool if clicking on backreferences would jump to the text being referenced.

0d0a

3 months ago

1 reply

Exactly, it misses out on explaining how the fixed Huffman table is interpreted to apply symbol and distance codes, or how dynamic tables are derived from the input itself. Sure it's the hardest part, but also the more interesting to visualize. As another commenter pointed out, we are just left with mysterious bit sequences for these codes.

It would be cool if we could supply our own Huffman table and see how that affects the stream itself. We might want to put our text right there! https://github.com/nevesnunes/deflate-frolicking?tab=readme-...

cogman10

3 months ago

I think this is something that makes a decent teaching aid but doesn't work well for the uninitiated.

You need someone to spell out exactly what each of the sections are and what they are doing.

Twirrim

3 months ago

2 replies

As someone who's never really read that much on compression stuff, I have absolutely zero clue what this visualisation is actually showing me.

That's compounded by the lack of legend. What do the different shades of blue and purple tell me? What is Orange?

e.g. on a given text in an orange block it puts e.g. x4<-135. x4 seems to indicate that the first 4 binary values for the block are important, but I can't figure out what that 135 is referencing (I assume it's some pointer to a value?)

fwip

3 months ago

1 reply

Using this example paragraph, at compression level 1 or higher (copy with the quotation symbols):

“It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of light, it was the season of darkness, it was the spring of hope, it was the winter of despair.”

The red bit at the beginning is Zlib header information and parameters. This basically tells the decoder the format of the data coming up, how big the data is, etc.

The following grey section is the huffman coding tables - more common characters in the input are encoded in a fewer number of bits. This is what later tells the decoder, that 000 means 'e' and 1110110 means 'I'.

Getting into the content now - this is where the decoder can start emitting the uncompressed text. The first 3 purple characters are the unicode values for the fancy opening quote - because they're rare in this text, they're each encoded as 6 or 7 bits. Because they take a lot of bits, this website is showing them as a purple color, as well as physically wider. The nearby 't' is encoded in 4 bits, 0110, and is represented in a bluer color.

The orange bits you've mentioned are back references - "x10 <- 26" here means "go back 26 characters in what you've decoded, and then copy 10 characters again." In this way, we can represent "t was the " in only 12 bits, because we've seen it previously.

The grey at the end is a special "end of stream" marker, followed by a red checksum which allows decoders to make sure there wasn't any corruption in the input.

I think that's everything. Further reading: https://en.wikipedia.org/wiki/Zlib https://en.wikipedia.org/wiki/Deflate https://en.wikipedia.org/wiki/Huffman_coding

Twirrim

3 months ago

1 reply

Thank you! I appreciate the explanation

fwip

3 months ago

Happy to help :) I think compression algorithms are super cool, and zlib is a nice example of how just two simple techniques (Huffman coding and dictionary compression) can combine to usefully compress nearly any real-world data.

Newer compression algorithms like zstd, brotli and lz4 basically just use these same methods in different ways. (There's also slightly newer alternatives to Huffman coding, like Asymmetric Numeral Systems and Arithmetic Coding, but fundamentally they're the same concept).

lifthrasiir

3 months ago

It is a backreference, the main way of dealing with full or partial repetitions in the LZ77 algorithm. It literally means: copy 4 characters from the backward offset of 135. Note that this "backward offset" can overlap previously repeated characters, so x10<-1 equally means: copy the last character 10 times.

chmod775

3 months ago

1 reply

The byte counter seems broken somehow. "Compressing" a single character with a compression level of 0 says "12 bytes", yet in the visualization there's less than 8 bytes (~7.5).

When compressing with a level higher than 0, the bits also don't appear to add up to a natural number of bytes, so I'm thinking the visualization is missing some padding?

duskwuff

3 months ago

1 reply

At least for me, compressing a single "a" at compression level 0 gives me an output of 91 bits, which rounds up to 12 bytes.

chordbug

3 months ago

I fixed the bug after reading the comment you replied to :) (I'm @lynn)

The LEN and NLEN items were not getting visualized.

ale42

3 months ago

This is great! Just missing a way to understand how the parameters are encoded, or is there something somewhere?

DamonHD

3 months ago

I would really like to see one of these for brotli.

Also for zopfli vs level 9 compression with this tool as-is.

jonjonsonjr

3 months ago

Something must be in the air. I've been working on a gzip/deflate visualizer recently as well: https://jonjohnsonjr.github.io/deflate/

This is very work in progress, but for folks looking for a deeper explanation of how dynamic blocks are encoded, this is my attempt to visualize them.

(This all happens locally with way too much wasm, so attempting to upload a large gzip file will likely crash the tab.)

tl;dr for btype 2 blocks:

3 bit block header.

Three values telling you how many extra (above the minimum number) symbols are in each tree: HLIT, HDIST, and HCLEN.

First, we read (HCLEN + 4) * 3 bits.

These are the bit counts for symbols 0-18 in the code length tree, which gives you the bit patterns for a little mini-language used to compactly encode the literal/length and distance trees. 0-15 are literal bit lengths (0 meaning it's omitted). 16 repeats the previous symbol 3-6 times. 17 and 18 encode short (3-10) and long (11-138) runs of zeroes, which is useful for encoding blocks with sparse alphabets.

These bits counts are in a seemingly strange order that tries to push less-likely bit counts towards the end of the list so it can be truncated.

Knowing all the bit lengths for values in this alphabet allows you to reconstruct a huffman tree (thanks to canonical huffman codes) and decode the bit patterns for these code length codes.

That's followed by a bitstream that you decode to get the bit counts for the literal/length and distance trees. HLIT and HDIST (from earlier) tell you how many of these to expect.

Again, you can reconstruct these trees using just the bit lengths thanks to canonical huffman codes, which gives you the bit patterns for the data bitstream.

Then you just decode the rest of the bitstream (using LZSS) until you hit 256, the end of block (EOB).

If you're not already familiar with deflate, don't be discouraged if none of that made any sense. Bill Bird has an excellent (long) lecture that I recommend to everyone: https://www.youtube.com/watch?v=SJPvNi4HrWQ

View full discussion on Hacker News

ID: 45373784Type: storyLast synced: 11/20/2025, 3:41:08 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN