Zlib Visualizer
Posted3 months agoActive3 months ago
lynn.github.ioTechstory
calmpositive
Debate
40/100
Compression AlgorithmsData VisualizationZlib
Key topics
Compression Algorithms
Data Visualization
Zlib
A visualizer for zlib compression is shared, sparking discussion on its features, limitations, and potential improvements, with users providing feedback and suggestions for enhancements.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
3d
Peak period
17
84-96h
Avg / period
8
Comment distribution32 data points
Loading chart...
Based on 32 loaded comments
Key moments
- 01Story posted
Sep 25, 2025 at 11:19 AM EDT
3 months ago
Step 01 - 02First comment
Sep 28, 2025 at 9:52 PM EDT
3d after posting
Step 02 - 03Peak activity
17 comments in 84-96h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 30, 2025 at 10:33 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45373784Type: storyLast synced: 11/20/2025, 3:41:08 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://github.com/lynn/flateview/blob/2668beaa5cc8cae387b6f...
I wonder if this can be blamed on the HN title auto-shortener or not...
1) The handling of dynamic blocks leaves something to be desired. The parameters are left mostly undecoded. It'd be really neat if the Huffman symbols were listed somewhere, rather than just being left implicit.
2) The visualization falls apart pretty badly for texts consisting of more than one block (which tends to happen around 32 KB) - symbols are still decoded, but references all show up blank.
Large inputs make the page hang for a bit, but that's probably pretty hard to avoid.
And as an enhancement: it'd be really cool if clicking on backreferences would jump to the text being referenced.
It would be cool if we could supply our own Huffman table and see how that affects the stream itself. We might want to put our text right there! https://github.com/nevesnunes/deflate-frolicking?tab=readme-...
You need someone to spell out exactly what each of the sections are and what they are doing.
That's compounded by the lack of legend. What do the different shades of blue and purple tell me? What is Orange?
e.g. on a given text in an orange block it puts e.g. x4<-135. x4 seems to indicate that the first 4 binary values for the block are important, but I can't figure out what that 135 is referencing (I assume it's some pointer to a value?)
“It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of light, it was the season of darkness, it was the spring of hope, it was the winter of despair.”
The red bit at the beginning is Zlib header information and parameters. This basically tells the decoder the format of the data coming up, how big the data is, etc.
The following grey section is the huffman coding tables - more common characters in the input are encoded in a fewer number of bits. This is what later tells the decoder, that 000 means 'e' and 1110110 means 'I'.
Getting into the content now - this is where the decoder can start emitting the uncompressed text. The first 3 purple characters are the unicode values for the fancy opening quote - because they're rare in this text, they're each encoded as 6 or 7 bits. Because they take a lot of bits, this website is showing them as a purple color, as well as physically wider. The nearby 't' is encoded in 4 bits, 0110, and is represented in a bluer color.
The orange bits you've mentioned are back references - "x10 <- 26" here means "go back 26 characters in what you've decoded, and then copy 10 characters again." In this way, we can represent "t was the " in only 12 bits, because we've seen it previously.
The grey at the end is a special "end of stream" marker, followed by a red checksum which allows decoders to make sure there wasn't any corruption in the input.
I think that's everything. Further reading: https://en.wikipedia.org/wiki/Zlib https://en.wikipedia.org/wiki/Deflate https://en.wikipedia.org/wiki/Huffman_coding
Newer compression algorithms like zstd, brotli and lz4 basically just use these same methods in different ways. (There's also slightly newer alternatives to Huffman coding, like Asymmetric Numeral Systems and Arithmetic Coding, but fundamentally they're the same concept).
When compressing with a level higher than 0, the bits also don't appear to add up to a natural number of bytes, so I'm thinking the visualization is missing some padding?
The LEN and NLEN items were not getting visualized.
Also for zopfli vs level 9 compression with this tool as-is.
This is very work in progress, but for folks looking for a deeper explanation of how dynamic blocks are encoded, this is my attempt to visualize them.
(This all happens locally with way too much wasm, so attempting to upload a large gzip file will likely crash the tab.)
tl;dr for btype 2 blocks:
3 bit block header.
Three values telling you how many extra (above the minimum number) symbols are in each tree: HLIT, HDIST, and HCLEN.
First, we read (HCLEN + 4) * 3 bits.
These are the bit counts for symbols 0-18 in the code length tree, which gives you the bit patterns for a little mini-language used to compactly encode the literal/length and distance trees. 0-15 are literal bit lengths (0 meaning it's omitted). 16 repeats the previous symbol 3-6 times. 17 and 18 encode short (3-10) and long (11-138) runs of zeroes, which is useful for encoding blocks with sparse alphabets.
These bits counts are in a seemingly strange order that tries to push less-likely bit counts towards the end of the list so it can be truncated.
Knowing all the bit lengths for values in this alphabet allows you to reconstruct a huffman tree (thanks to canonical huffman codes) and decode the bit patterns for these code length codes.
That's followed by a bitstream that you decode to get the bit counts for the literal/length and distance trees. HLIT and HDIST (from earlier) tell you how many of these to expect.
Again, you can reconstruct these trees using just the bit lengths thanks to canonical huffman codes, which gives you the bit patterns for the data bitstream.
Then you just decode the rest of the bitstream (using LZSS) until you hit 256, the end of block (EOB).
If you're not already familiar with deflate, don't be discouraged if none of that made any sense. Bill Bird has an excellent (long) lecture that I recommend to everyone: https://www.youtube.com/watch?v=SJPvNi4HrWQ