Binary Formats Gallery
Posted3 months agoActive3 months ago
formats.kaitai.ioTechstory
supportivepositive
Debate
20/100
Binary FormatsKaitaiReverse Engineering
Key topics
Binary Formats
Kaitai
Reverse Engineering
The Binary Formats Gallery showcases Kaitai, a declarative language for describing binary file formats, sparking discussion on its capabilities and limitations.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
39m
Peak period
6
0-2h
Avg / period
2.8
Comment distribution25 data points
Loading chart...
Based on 25 loaded comments
Key moments
- 01Story posted
Oct 3, 2025 at 8:19 PM EDT
3 months ago
Step 01 - 02First comment
Oct 3, 2025 at 8:57 PM EDT
39m after posting
Step 02 - 03Peak activity
6 comments in 0-2h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 5, 2025 at 6:04 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45469285Type: storyLast synced: 11/20/2025, 6:42:50 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Various hex editors have their own formats. 010 Editor has C-style binary templates, imhex has a binary pattern language as well. Okteta has Okteta Structure Definitions which can be declared using XML or with JS.
Kaitai Struct is the most complete system that has code generation for multiple programming languages and isn't tied to a hex editor or anything else for that matter. That said, I think there's still a ton of room for improvement and innovation. Kaitai has a lot of useful tooling, but I think as it is today it falls a bit short: the code gen is not at the same support level for all languages (most languages are fairly limited), and I think serialization is still mostly experimental. That and there's probably a lot you could do to still make it more expressive and powerful.
I mainly started a third-party Kaitai implementation to experiment a bit with supporting new features in Go, and also just to have a native Go implementation for convenience, since I'm still not very good at Scala. However, once an approach is developed for how exactly to handle emitting to Wireshark it should be purely mechanical to graft on a Wireshark emitter to the upstream Kaitai Struct compiler, too.
https://github.com/jchv/zanbato
Protobuf and its ilk (ASN.1, Cap’n Proto, etc.) have you describe a tree structure, then map that to bytes according to their own sensibilities. Kaitai and its ilk (Wireshark might be a more familliar member of the group) have you describe a bunch of data structures as well as somebody else’s pretty much arbitrary ideas as to how they are to map to bytes, then deal with the results.
You can’t use a Protobuf implementation to get EXIF data out of JPEGs, but then you can’t get format evolution guarantees out of Kaitai either.
(I hear ASN.1 can somewhat cross the gap using ECN, but as far as I can tell literally nobody uses that in public.)
### kaitai - https://github.com/kaitai-io/kaitai_struct - https://github.com/kaitai-io/awesome-kaitai - http://formats.kaitai.io/dos_datetime/index.html
### Hexinator / Synalyze It! - Universal Parsing Engine - Hexinator is freemium version of Synalyze It! - https://github.com/synalysis/Grammars/blob/master/bitmap.gra...
### quickbms - http://aluigi.altervista.org/quickbms.htm
## multiex - http://multiex.xentax.com/
### Game Extractor by WATTO - http://www.watto.org/game_extractor.html
### 010 editor templates - https://www.sweetscape.com/010editor/repository/templates/
### hex fiend templates - https://github.com/HexFiend/HexFiend/tree/master/templates
### malcat - has some form of binary templates - https://malcat.fr/
### Andys Binary Folding Editor - http://www.nyangau.org/be/be.htm
### winhex templates - https://www.x-ways.net/winhex/templates/index.html
### TRiD - file identifier - TrID is an utility designed to identify file types from their binary signatures. - https://mark0.net/soft-trid-e.html
### GNU file - https://github.com/file/file
### Noesis - Noesis is a tool for previewing and converting between hundreds of model, image, and animation formats. - http://richwhitehouse.com/index.php?content=inc_projects.php... - https://github.com/RoadTrain/noesis-plugins - https://github.com/RoadTrain/noesis-plugins-official
### Ninja ripper - extract individual models from DirectX 3D games, while they are running - https://ninjaripper.com/
### Unpakke - http://www.nullsecurity.org/unpakke
### Camoto online-only universal game modding tool - https://moddingwiki.shikadi.net/wiki/Camoto - https://camoto.shikadi.net/
https://construct.readthedocs.io/en/latest/intro.html#exampl...
https://doc.kaitai.io/user_guide.html#process https://github.com/kaitai-io/kaitai_compress/blob/master/pyt...
* Things may be non-byte-aligned bitstreams.
* Arrays of structures that go "read until id is 5, but if id is 5, nothing else of the structure is emitted."
* Fields that may be optional if some parent of the current record has some weird value.
* Files may be composed of records at arbitrary, random offsets that essentially require seeking to make any sense of it.
* The metadata of your structure may depend on some early parameter (for example, is this field big-endian or little-endian?)
and so on.
File formats like ELF (supporting ELF32, ELF64, and both little-endian and big-endian, all in a single format definition) or Java class files (long and double entries in the constant pool take up two slots, not one) are a better guideline for how powerful the format is in handling weirder idiosyncracies.
(Another example are checksums.)
> Things may be non-byte-aligned bitstreams.
* https://doc.kaitai.io/user_guide.html#_bit_sized_integers
> Arrays of structures that go "read until id is 5, but if id is 5, nothing else of the structure is emitted."
* https://doc.kaitai.io/user_guide.html#_repetitions
> Fields that may be optional if some parent of the current record has some weird value.
* https://doc.kaitai.io/user_guide.html#do-nothing
> Files may be composed of records at arbitrary, random offsets that essentially require seeking to make any sense of it.
* https://doc.kaitai.io/user_guide.html#_relative_positioning
> The metadata of your structure may depend on some early parameter (for example, is this field big-endian or little-endian?)
* https://doc.kaitai.io/user_guide.html#param-types
* https://doc.kaitai.io/user_guide.html#switch-advanced