Static Allocation for Compilers

enz · 2025-12-23T22:53:51.000Z

Discussion: Static Allocation for Compilers

Posted16 days agoActive9 days ago

enz

22 points

9 comments

matklad.github.ioTech Discussionstory

informativepositive

Debate

20/100

Code MigrationMemory LocalityOptimization Techniques

Key topics

Code Migration

Memory Locality

Optimization Techniques

Discussion Activity

Light discussion

First comment

Peak period

144-156h

Avg / period

2.2

Key moments

01Story posted
Dec 23, 2025 at 5:53 PM EST
16 days ago
Step 01
02First comment
Dec 24, 2025 at 2:52 AM EST
9h after posting
Step 02
03Peak activity
4 comments in 144-156h
Hottest window of the conversation
Step 03
04Latest activity
Dec 30, 2025 at 1:03 PM EST
9 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (9 comments)

Showing 11 comments

delifue

15 days ago

2 replies

How does static allocation avoid wasting memory?

Static memory allocation requires hardcoding an upper limit of size of everything. For example, if you limit each string to be at most 256 bytes, then a string with only 10 bytes will waste 246 bytes of memory.

If you limit string length to 32 bytes it will waste fewer memory but when a string longer than 32 bytes comes it cannot handle.

Joker_vD

15 days ago

> if you limit each string to be at most 256 bytes, then a string with only 10 bytes will waste 246 bytes of memory.

No? Unless you limit each string to be exactly 256 bytes but that's silly.

> If you limit string length to 32 bytes it will waste fewer memory but when a string longer than 32 bytes comes it cannot handle.

Not necessarily. The early compilers/linkers routinely did "only the first 6/8 letters of an identifier are meaningful" schtick: the rest was simply discarded.

AlotOfReading

10 days ago

Your C++ compiler already implements a solution to that called short string optimization. Strings start out as small byte buffers that can be easily be passed around. When they grow beyond that, the fixed buffer is swapped out for pointer to another allocation on the heap. There's no (immediate) reason that allocation has to come from a direct call to the system allocator though, and it usually doesn't. It can just as easily come from an allocation pool that was initialized at startup.

Even if you needed to hardcode upper size limits, which your compiler already does to some extent (the C/C++ standards anticipate this by setting minimum limits for certain things like string length), you wouldn't actually pay the full price on most systems because of overcommit. There are other downsides to this depending on implementation details like how you reclaim memory and spawn compiler processes, so I'm not suggesting it as a good idea. It's just possible.

pwdisswordfishy

10 days ago

1 reply

> I feel that a strict separation between O(N) compiler output and O(1) intermediate processing artifacts [...]

I don't follow. He has just said that although the size of the arena is finite, the input and output are unbounded, and the compiler does its work by processing "a sequence of chunks" (i.e. those things that will fit into the finitely sized arena). That's not "O(1) intermediate processing artifacts". It's still O(n).

> [...] can clarify compiler’s architecture, and I won’t be too surprised if O(1) processing in compilers would lead to simpler code

This doesn't seem like an intuitive conclusion at all. There's more recordkeeping needed now, and more machinery in need of being implemented, which should be expected to make for something that is neither simple nor easy.

We haven't even gotten around to addressing how "statically allocating" a fixed size arena that your program necessarily subdivides into pieces (before moving onto the next chunk and doing the same) is just "dynamic allocation with extra steps". (If you want or just think that it would be neat to write/control your own allocator, then fine, but... say that.)

pwdisswordfishy

10 days ago

Having said that, if this is really all just a roundabout way to get the Rust people to actually give a damn about memory use and sell the idea that "you really shouldn't require more than 4GB of memory just to bootstrap the compiler and/or build other medium-to-very-large programs," then hey that's great.

norir

10 days ago

1 reply

I honestly think that a 2GB limit for a code base, excluding non code assets would be perfectly reasonable. If it exceeds that it should be split into separate modules.

olig15

9 days ago

Based on what?

C/C++ compilers can get huge compile time speed ups by compiling translation units as Unity files. For my work AAA game engine, a compiler can use 8GB+ per unit.

Just splitting up code might allow the compiler to use less memory, but compile time will increase hugely.

deivid

10 days ago

It's an interesting idea. I'm butchering TCC (tiny c compiler) for a side project/experiment, and using arenas sped it up 2x. This of course requires the memory limit to be specified in advance, but for my situation that's fine.

Joker_vD

15 days ago

Yes, you can dump your IR straight to the disk and then stream it to process further. That's how quite a number of compilers (and assemblers) were written back in the 70s and it was quite painful.

IIRC, Unix's original as works that way: during assembly, the text and data sections are written into separate temporary files, and then they are merged together into the a.out. And yes, it's slow.

cloudhead

10 days ago

How does this work? Files need to reference other files eg. for calling functions from other modules, which means semantic analysis needs both files in memory to check the types. This is especially complicated with mutual recursion across modules (separate compilation doesn't apply here). If you're building a language like C where everything requires forward declarations, then maybe, but anything more modern seems difficult.

azakai

10 days ago

This is certainly possible: Break up large functions, and then the bounded maximum function size can be your O(1) chunk size, to be processed in a streaming manner without dynamic allocation.

However, breaking up huge functions (or skipping optimizations on them) will lead to missed opportunities. And LTO-style optimizations, where the entire program is taken into account, can be very important as well (as a concrete example, we see huge benefits from doing that in wasm-opt for Wasm GC).

Still, it's a nice idea, and maybe it can make 80% of compiler passes a lot faster!

View full discussion on Hacker News

ID: 46370446Type: storyLast synced: 12/30/2025, 12:35:28 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN