Not

Hacker News!

Beta
Home
Jobs
Q&A
Startups
Trends
Users
Live
AI companion for Hacker News

Not

Hacker News!

Beta
Home
Jobs
Q&A
Startups
Trends
Users
Live
AI companion for Hacker News
  1. Home
  2. /Story
  3. /Kaitai Struct: declarative binary format parsing language
  1. Home
  2. /Story
  3. /Kaitai Struct: declarative binary format parsing language
Oct 14, 2025 at 10:51 AM EDT

Kaitai Struct: declarative binary format parsing language

djoldman
144 points
53 comments

Mood

supportive

Sentiment

positive

Category

tech

Key topics

Binary Parsing

Kaitai Struct

Data Formats

Debate intensity20/100

Kaitai Struct is a declarative binary format parsing language that has received positive feedback from the community for its ease of use and versatility, with discussions highlighting its applications and comparisons to other tools.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

9d

Peak period

44

Day 10

Avg / period

26.5

Comment distribution53 data points
Loading chart...

Based on 53 loaded comments

Key moments

  1. 01Story posted

    Oct 14, 2025 at 10:51 AM EDT

    about 1 month ago

    Step 01
  2. 02First comment

    Oct 23, 2025 at 3:43 PM EDT

    9d after posting

    Step 02
  3. 03Peak activity

    44 comments in Day 10

    Hottest window of the conversation

    Step 03
  4. 04Latest activity

    Oct 25, 2025 at 2:41 AM EDT

    about 1 month ago

    Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (53 comments)
Showing 53 comments
theLiminator
about 1 month ago
3 replies
Is the main difference from https://github.com/google/wuffs being that Kaitai is declarative?
setheron
about 1 month ago
1 reply
Looking at that repo.. i have no clue how to get started.
nigeltao
about 1 month ago
The top-level README has a link called "Getting Started".
nigeltao
about 1 month ago
1 reply
See https://github.com/google/wuffs/blob/main/doc/related-work.m...

> Kaitai Struct is in a similar space, generating safe parsers for multiple target programming languages from one declarative specification. Again, Wuffs differs in that it is a complete (and performant) end to end implementation, not just for the structured parts of a file format. Repeating a point in the previous paragraph, the difficulty in decoding the GIF format isn't in the regularly-expressible part of the format, it's in the LZW compression. Kaitai's GIF parser returns the compressed LZW data as an opaque blob.

Taking PNG as an example, Kaitai will tell you the image's metadata (including width and height) and that the compressed pixels are in the such-and-such part of the file. But unlike Wuffs, Kaitai doesn't actually decode the compressed pixels.

---

Wuffs' generated C code also doesn't need any capabilities, including the ability to malloc or free. Its example/mzcat program (equivalent to /bin/bzcat or /bin/zcat, for decoding BZIP2 or GZIP) self-imposes a SECCOMP_MODE_STRICT sandbox, which is so restrictive (and secure!) that it prohibits any syscalls other than read, write, _exit and sigreturn.

(I am the Wuffs author.)

corysama
about 1 month ago
1 reply
Wuffs looks pretty awesome. Thanks for making it.

Wuffs is intended for files. But, would it be a bad idea to use it to parse network data from untrusted endpoints?

nigeltao
about 1 month ago
It's a great idea. Chromium uses Wuffs to parse GIF data from the untrusted network.

There's also a "wget some JSON and pipe that to what Wuffs calls example/jsonptr" example at https://nigeltao.github.io/blog/2020/jsonptr.html#sandboxing

Sesse__
about 1 month ago
They overlap, but none does strictly more than the other.

Kaitai is for describing, encoding and decoding file formats. Wuffs is for decoding images (which includes decoding certain file formats). Kaitai is multi-language, Wuffs compiles to C only. If you wrote a parser for PNGs, your Kaitai implementation could tell you what the resolution was, where the palette information was (if any), what the comments look like and on what byte the compressed pixel chunk started. Your Wuffs implementation would give you back the decoded pixels (OK, and the resolution).

Think of Kaitai as an IDL generator for file formats, perhaps. It lets you parse the file into some sort of language-native struct (say, a series of nested objects) but doesn't try to process it beyond the parse.

mturk
about 1 month ago
1 reply
Kaitai is absolutely one of my favorite projects. I use it for work (parsing scientific formats, prototyping and exploring those formats, etc) as well as for fun (reverse engineering games, formats for DOSbox core dumps, etc).

I gave a guest lecture in a friend's class last week where we used Kaitai to back out the file format used in "Where in Time is Carmen Sandiego" and it was a total blast. (For me. Not sure that the class agreed? Maybe.) The Web IDE made this super easy -- https://ide.kaitai.io/ .

(On my youtube page I've got recordings of streams where I work with Kaitai to do projects like these, but somehow I am not able to work up the courage to link them here.)

heromal
about 1 month ago
1 reply
I'm curious, how do you use it for Game RE?
ACCount37
about 1 month ago
1 reply
Not the author, but also in RE.

RE, especially of older and more specialized software, involves dealing with odd undocumented binary formats. Which you may have to dissect carefully with a hex editor and a decompiler, so that you can get at the data inside.

Kaitai lets you prototype a parser for formats like that on the go, quick and easy.

pvitz
about 1 month ago
1 reply
A shot in the dark, but maybe you could give me a hint. Recently, I was interested in extracting sprites from an old game. I was able to reverse the file format of the data archive, which contained the game assets as files. However, I got stuck because the image files were obviously compressed. By chance, I found an open source reimplementation of the game and realised it was LZ77+Huffman compressed, but how would one detect the type of compression and parameters with only the file? That seems a pretty hard problem or are there good heuristics to detect that?
ACCount37
about 1 month ago
1 reply
Some simpler cases like various RLE-type encodings can be figured out with that pattern recognizing brain - by staring at them really really hard.

For harder cases? You take the binaries that read or write your compressed files, load them in your tool (typically Ghidra nowadays), and track down the code that does it.

Then you either recognize what that code does (by staring at it really really hard), or try to re-implement it by hand while reading up on various popular compression algos in hope that doing this enlightens you.

Third option now: feed the decompiled or reimplemented code to the best LLM you have access to, and ask it. Those things are downright superhuman at pattern matching known algorithms, so use them, no such thing as "cheating" in RE.

The "hard mode" is compression implemented in hardware, with neither a software encoder or a software decoder available. In which case you better be ready for a lot of "feed data to the magic registers, see results, pray they give you a useful hint" type of blind hardware debugging. Sucks ass.

The "impossible" is when you have just the compressed binaries, with no encoder or decoder or plaintext data available to you at all. Better hope it's something common or simple enough or it's fucking hopeless. Solving that kind of thing is cryptoanalysis level of mind fuck and I am neither qualified enough nor insane enough to advise on that.

Another thing. For practical RE? ALWAYS CHECK PRIOR WORK FIRST. You finding an open source reimplementation? Good job, that's what you SHOULD be doing, no irony, that's what you should be doing ALWAYS. Always check whether someone has been there and done that! Always! Check whether someone has worked on this thing, or the older version of it, or another game in the same engine - anything at all. Can save you literal months of banging your head against the wall.

pvitz
about 1 month ago
1 reply
Thanks for your reply and advice! I guess what you describe as "impossible" is the case I am mostly interested in, though more for non-executable binary data. If I am not mistaken, this goes under the term "file fragment classification", but I have been wondering if practitioners might have figured out some better ways than what one can find in scholarly articles.

And yes, searching for the reimplementation beforehand would have saved me some hours :D

ACCount37
about 1 month ago
1 reply
It's not about the data being executable. It's about having access to whatever reads or writes this data.

Whatever reads or writes this data has to be able to compress or decompress it. And with any luck, you'll be able to take the compression magic sauce from there.

pvitz
about 1 month ago
1 reply
I understood "binaries" in "compressed binaries" as "executables", e.g. like a packed executable, but I see that you mean indeed a binary file (and not e.g. a text file).
ACCount37
about 1 month ago
Reread that just now, sorry for not making it clearer. I kind of just used "binaries" in both senses? Hope the context clears it up.
okanat
about 1 month ago
1 reply
Even if you don't want to use it since it is not as efficient as a hand-written specialized parser, Kaitai Struct gives a perfect way of documenting file formats. I love the idea and every bit of the project!
jonstewart
about 1 month ago
I like using it for parsing structs but then intersperse procedural code in it for loops/containers, so not everything gets read into RAM all at once.
sitkack
about 1 month ago
2 replies
What was the Python based binary parsing library from around 2010? Hachoir?

https://hachoir.readthedocs.io/en/latest/index.html

ctoth
about 1 month ago
Construct?
jonstewart
about 1 month ago
Hachoir was rad, just not very fast.
ginko
about 1 month ago
2 replies
No pure C backend?
vendiddy
about 1 month ago
1 reply
It's not C but we have sponsored a Zig target for Kaitai. If anyone reading this knows Zig well, please comment because would love to get a code review of the generated code!
vitalnodo
about 1 month ago
1 reply
Can you share the link? I wonder also whether it uses comptine features.
vendiddy
about 1 month ago
It is not yet ready but the master branch has an initial draft.

https://github.com/kaitai-io/kaitai_struct_compiler/commits/...

It would be premature to review now because there are some missing features and stuff that has to be cleaned up.

But I am interested in finding someone experienced in Zig to help the maintainer with a sanity check to make best practices are being followed. (Would be willing to pay for their time.)

If comptime is used, it would be minimal. This is because code-generation is being done anyway so that can be an explicit alternative to comptime. But we have considered using it in a few places to simplify the code-generation.

dhsysusbsjsi
about 1 month ago
This would be great for most projects as Swift for example is abandoned & 6+ years since last commit.
dgan
about 1 month ago
1 reply
Wow this is good. My only complaint is annoyingly verbose yaml. What if I would like to use Kaitai instead of protobuffs, my .proto file is already a thousand lines, splitting each od these lines into 3-4 yaml indented lines is hurting readability
indrora
about 1 month ago
1 reply
You're using it for the wrong thing, then.

KS isn't for general data mangling, it's for "I have this format and I need a de novo parser for it that works under explicit rules" and you're willing to do the work of fully implementing it from the bytes up.

dgan
about 1 month ago
Ok. What if I want a general, polyglot data mangling tool, which doesnt produces metrics tons of bloat like protobuf does?
Everdred2dx
about 1 month ago
1 reply
I had a ton of fun using Kaitai to write an unpacking script for a video game's proprietary pack file format. Super cool project.

I did NOT have fun trying to use Kaitai to pack the files back together. Not sure if this has improved at all but a year or so ago you had to build dependencies yourself and the process was so cumbersome it ended up being easier to just write imperative code to do it myself.

kodachi
about 1 month ago
It hasn't improved that much, you need to know the final size and fill all attributes, there are no defaults, at least in Python.
carom
about 1 month ago
1 reply
My dream for a parsing library / language is that it would be able to read, manipulate, and then re-serialize the data. I'm sure there are a ton of edge cases there, but the round trip would be so useful for fuzzing and program analysis.
jaxefayo
about 1 month ago
From what I’ve read, kaitai does that now. For the longest time it could only parse, but I believe now it can generate/serialize.
whitten
about 1 month ago
1 reply
To quote from the page: id: flags type: u1

This seems to say flags is a sort of unsigned integer.

Is there a way to break the flags into big endiaN bits where the first two bits are either 01 or 10 but not 00 or 11 with 01 meaning DATA and 01 meaning POINTER with the next five bits as a counter of segments and the next bit is 1 if the default is BLACK and 1 if the default is WHITE ?

CGamesPlay
about 1 month ago
Appears so: https://doc.kaitai.io/user_guide.html#_bit_sized_integers
pabs3
about 1 month ago
1 reply
Kaitai is one of many different tools that do this, there is a list of them here:

https://github.com/dloss/binary-parsing

Personally I like GNU Poke.

dhx
about 1 month ago
Is anyone aware of a project that provides simplified declaration of constraint checking?

For example:

  structures:
    struct a { strz b, strz c, int d, str[d] e }
  
  constraints:
    len(b) > len(c)
    d > 6
    d <= 10
    e ~ /^ABC\d+/
depierre
about 1 month ago
One of my personal favorites. I've used it for parsing SAP's RPC network protocol, reverse-engineering Garmin apps [0], and more recently in a CTF challenge that involved an unknown file format, among others. It's surprisingly quick to pick up once you get the hang of the syntax.

The serialization branch for Python [1] (I haven't tried the Java one) has generally done the job for me, though I've had to patch a few edge cases.

One feature I've often wished for is access to physical offsets within the file being parsed (e.g. being able to tell that a field foo that you just parsed starts at offset 0x100 from the beginning of the file). As far as I know, you only get relative offsets to the parent structure.

0: https://github.com/anvilsecure/garmin-ciq-app-research/blob/...

1: https://doc.kaitai.io/serialization.html

zzlk
about 1 month ago
I wanted to use this a long time ago but the rust support wasn't there. I can see now that it's on the front page with apparently first class support so looks like I can give it a go again.
karhuton
about 1 month ago
Wow. Looking at the schema for gif, it’s so readable, I can’t help to wonder why something like this hasn’t become the standard way to work with binary formats over the decades already!

Seems like eveything has to be JSON and text based these days, because binary is more difficult DX.

When reading articles discussing binary formats, I usually see them using box diagrams of packets, description tables or hexdumps.

This neatly describes nested structure, names and ”types” - just enough.

I wonder if there’s a hexdump like viewer in IDEs that can present binary files like this? I can also imagine a simple UI to make the files editable using this.

lzcdhr
about 1 month ago
Does it support incremental parsing? For example, when I am parsing a network protocol, can it still consume some data from the head of the buffer even if the data is incomplete? This would not only avoid multiple attempts to restart parsing from the beginning but also prevent the buffer from growing excessively.
setheron
about 1 month ago
Great timing! I just published https://github.com/fzakaria/nix-nar-kaitai-spec and contributed kaitai C++ STL runtime to nixpkgs https://github.com/NixOS/nixpkgs/pull/454243
Rucadi
about 1 month ago
The most success I had so far on doing a project where I had to work with binary data parsing is Deku in rust, I would give this a try if I have the opportunity
bburky
about 1 month ago
Kaitai is pretty nice. Hex editors with structure parsing support used to be more rare than they are now, so I've used https://ide.kaitai.io/ instead a few times.

Also, the newest Kaitai release added (long awaited) serialization support! I haven't had a chance to try it out.

https://kaitai.io/news/2025/09/07/kaitai-struct-v0.11-releas...

layoric
about 1 month ago
I discovered this project recently and used it for Himawari Standard Data format and it made it so much easier. Definitely recommend using this if you need to create binary readers for uncommon formats.
metaPushkin
about 1 month ago
Enjoyable tool. When I developed my text RPG game, I prepared a Kaitai specification for the save file data format so that it would be easy to create third-party software for viewing and modifying it =)
imtringued
about 1 month ago
https://en.wikipedia.org/wiki/Data_Format_Description_Langua...

DFDL is heavily encroaching on Kaitai structs territory.

kodachi
about 1 month ago
The recent release of 0.11 marks the inclusion of the long awaited serialization feature. Python and Java only for now. I've been using it for a while for Python and although it has some rough edges, it works pretty well and I'm super excited for the project.
somethingsome
about 1 month ago
I didn't check exactly what Kaitai does but, MPEG uses a custom SDL for it's binary syntax: https://mpeggroup.github.io/mpeg-sdl-editor/ Just sharing, in case someone is interested :)
Locutus_
about 1 month ago
How is the write support now-adays, is it production quality now?

I used Kaitai in a IoT project for building data ingress parsers and it was great. But not having write support was a bummer.

jdp
about 1 month ago
I also like Protodata [1]. It's complementary as an exploration and transformation tool when working with binary data formats.

[1]: https://github.com/evincarofautumn/protodata

woodruffw
about 1 month ago
Kaitai Struct is really great. I've used it several times over the years to quickly pull in a parser that I'd otherwise have to hand-roll (and almost certainly get subtly wrong).

Their reference parsers for Mach-O and DER work quite nicely in abi3audit[1].

[1]: https://github.com/pypa/abi3audit/tree/main/abi3audit/_vendo...

casey2
about 1 month ago
https://www.erlang.org/doc/system/bit_syntax.html

highly recommended if you like functional languages

View full discussion on Hacker News
ID: 45580795Type: storyLast synced: 11/20/2025, 9:01:20 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read ArticleView on HN

Not

Hacker News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

  • Home
  • Jobs radar
  • Tech pulse
  • Startups
  • Trends

Resources

  • Visit Hacker News
  • HN API
  • Modal cronjobs
  • Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2025 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.