Super-Flat Asts
Key topics
The debate around "super-flat ASTs" - a design pattern that represents abstract syntax trees as flat arrays of integers - is heating up, with commenters weighing in on its trade-offs. While some praise the approach for its performance benefits, others point out potential downsides, such as making it harder for humans to inspect and traverse the AST. The original author [uaksom] clarifies that the representation can be hostile to humans, as inspecting the AST in a debugger can be unhelpful, but counters suggest that writing custom helpers can mitigate this issue. As commenters discuss the Zig compiler and Scala's type system, a consensus emerges that super-flat ASTs are a nuanced design choice that depends on specific use cases.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
6d
Peak period
21
144-156h
Avg / period
6.5
Based on 26 loaded comments
Key moments
- 01Story posted
Dec 4, 2025 at 1:01 PM EST
29 days ago
Step 01 - 02First comment
Dec 10, 2025 at 1:03 PM EST
6d after posting
Step 02 - 03Peak activity
21 comments in 144-156h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 12, 2025 at 6:06 AM EST
22 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
More generally, I believe its fair to call this a form of handle-based designs: https://en.wikipedia.org/wiki/Handle_(computing) Which are EXTREMELY useful for a variety of reasons and imo woefully underused above the lowest system level.
My current research is about how to make handles just as convenient to use as pointers are, via a form of context: like a souped-up version of context in Odin or Jai if one is familiar with those, or like a souped-up version of coeffects if one has a more academic background.
As far as I know no language allows expressing that kind of thing.
But from what I understand (being a nonexpert on Scala), this scheme actually causes a lot of problems. I think I've even heard that it adds more undecidability to the type system? So I'm exploring ways of managing context that don't depend on inferring backward from the type.
The interesting question for me is where the crossover is now that IDEs and incremental compilation dominate the workload. If your front-end is effectively a long-running service, it might be worth keeping a friendlier AST around and only using a super-flat representation for hot paths like analysis passes or bulk refactors. Otherwise you risk saving a few hundred MB while spending engineer-months making every new pass fight the layout.
That's not nothing. But a parser is rarely the most time-intensive part of a production compiler. And the parser does get iterated on a lot in languages that are evolving and adding new syntax.
Given that, I'd be inclined to take the performance hit and stick with a simpler AST representation if that yields a more hackable, maintainable compiler front end.
I didn't go into this at all, but the main benefit of this design is how well it interacts with CPU cache. This has almost no effect on the parser, because you're typically just writing the AST, not reading it. I believe that subsequent stages benefit much more from faster traversal.
(By the way, I am a huge fan of your work. Crafting interpreters was my introduction to programming languages!)
I mean if you do an off by one error on indices, essentially you are readin the pointer of another node.
Because it is a correct one. std::vector can do this in debug mode, Zig and a bunch of other languages do as well. But on the other hand, if you reference 2 or more syntax nodes, you have to be very careful how you structure your code, because you are borrowing the same array multiple times.
If you initially wrote it with nodes that own their memory, and moved to this scheme, you'd have very different lifetimes.
> you can reference wrong nodes, invalid nodes, uninitalized nodex.
No, you cannot access uninitialized nodes in safe Rust.
When investigating performance issues its often very helpful to run with profiling instrumentation enabled and start by looking at some top-down "cumulative sum" profiler output to get a big picture view of which functions/phases are consuming most of the running time, to see where it may be worth spending some effort.
Getting familiar with linux's perf tool is also helpful, both in terms of interpreting its summary statistics but also being able to use it to annotate source line by line with time spent.
I'm not familiar with rust, but e.g. the rustc compiler dev guide has a tutorial on how to profile rustc using perf https://rustc-dev-guide.rust-lang.org/profiling/with_perf.ht...