Default Musl Allocator Considered Harmful to Performance
Posted4 months agoActive4 months ago
nickb.devTechstoryHigh profile
heatednegative
Debate
80/100
Musl AllocatorPerformance OptimizationLinux Distributions
Key topics
Musl Allocator
Performance Optimization
Linux Distributions
The default musl allocator is criticized for its performance issues, particularly in multi-threaded applications, sparking a discussion about the trade-offs between size optimization and performance in Linux distributions.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2d
Peak period
81
Day 3
Avg / period
30
Comment distribution90 data points
Loading chart...
Based on 90 loaded comments
Key moments
- 01Story posted
Sep 5, 2025 at 4:42 PM EDT
4 months ago
Step 01 - 02First comment
Sep 8, 2025 at 12:16 AM EDT
2d after posting
Step 02 - 03Peak activity
81 comments in Day 3
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 15, 2025 at 9:58 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45143347Type: storyLast synced: 11/20/2025, 5:39:21 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Harmful should be reserved for things that affect security or privacy e.g. accidentally encourage bugs like goto does.
C devs are the few I've met that seem to actually care.
This is an example of not caring about the software per se, but only about the outcome.
> [C is] in fact it's famously bug-friendly
Yes, but as a user I like that. I have a game that from the user-experience seams to have tons of use-after-free bugs. You see that as a user, as strings shown in the UI suddenly turn to garbage and then change very fast. Even with such fatal bugs, the program continues to work, which I like as a user, since I just want to play the game, I don't care if the program is correct. When I want to get rid of these garbage text, I simply close the in-game window and reopen it and everything is fine.
On the other side there are games written in Pascal or Java, which might not have that much bugs, but every single null pointer exception is fatal. This led to me not playing the games anymore, because being good and then having the program crash is so frustrating. I rather have it running a bit longer with silent corruption.
C won't help with any of that. Unless the cost of development using it will scare away management which requests those dumb features. Fair enough then :)
Your example is not one of a 'dumb' design, it is a deliberate 'dark pattern' --> pushing you to use OneDrive as much as possible so that to earn more money.
It makes sense to use a tech stack that lowers the cost on the developer side in the same way that it makes sense to make junk food. Why produce good, tasty food when there is more money do be made by just selling cheap stuff, it does the most important thing: give people calories without poisoning them (short term).
Get rid of those dumb decisions and it could have been pure JS and be 100% fine. C has no value here. The slow performance of JS is not harmful here. Discord is fast enough although it's Electron. VS Code is also fast enough.
But I'd also like to respond to the food analogy, since it's funny.
Let's say that going full untyped scripting language would be the fast food. You get things fast, it does the job, but is unhealthy. You can write only so much bash before throwing up.
Developing in C is like cooking for those equally dumb expensive unsustainable restaurants which give you "an experience" instead of a full healthy meal. Sure, the result uses the best ingredients, it's incredibly tasty but there's way too little food for too much cost. It's bad for the economy (the money should've been spent elsewhere), bad for the customer (same thing about money + he's going to be hungry!) and bad for the cook (if he chose a different job, he'd contribute to the society in better ways!) :D
Just go for something in the middle. Eat some C# or something.
Essentially you’re telling me that the software being made is not useful to many people; because the cost of writing the software (a handful of developers) will spend more time writing the software than their userbase will in executing their software.
Otherwise you’re inflicting something on humanity.
Dumping toxic waste in a river is much cheaper than properly disposing of it too; yet we understand that we are causing harm to the environment and litigate people who do that.
Slow software is fine in low volumes (think: shitting in the woods) but dumping it on huge numbers of users by default is honestly ridiculous (Teams, I’m looking at you: with your expectation to run always and on everyones machine!)
ROTFL. Is there any security audit ? /s
it does the job - mostly.
Linux lucked out, when you're doing tricky wait free concurrent algorithms that intrusive linked list you hand designed was a good choice. But over in userland you'll find another hand rolled list in somebody's single threaded file parser and oh, the growable array would be fifty times faster, shame the C programmer doesn't have one in their toolbox.
https://en.wikipedia.org/wiki/Considered_harmful
C's goto is a housecat to the full blown jump's tiger. No doubt an angry housecat is a nuisance but the tiger is much more dangerous.
C goto won't let you jump straight into the middle of unrelated code, for example, but the jump instruction has no such limit and neither did the feature Dijkstra was discussing.
In 2025 an allocator not cratering multi-threaded programs is the opposite of specialisation.
A too high access frequency to a shared resource is not a "general case", but simply poorly designed multithreaded code (but besides, a high allocation frequency through the system allocator is also poor design for any single-threaded code, application code simply should not assume any specific performance behaviour from the system allocator).
> application code simply should not assume any specific performance behaviour from the system allocator
Technically, yes. Practically, no; that's why e.g. C++ standard mandates time complexity of its containers. If you can't assume any specific performance from your system, that means you have to prepare for every system-provided functionality to be exponentially slow and obviously you can't do that.
Take, for instance, the JSON parser in GTA V [0]: apparently, sscanf(buffer, "%d", &n) calls strlen(buffer) internally, so using it to parse numbers in a hot loop on 2 MiB-long JSON craters your performance. On one hand, sure, one can argue that glibc/musl developers are within their right to implement sscanf however inefficiently they want, and the application developers should not expect any performance targets from it, and therefore, probably should not use it. On the other hand, what is even the point of the standard library if you're not supposed to use it for anything practical? Or, for that matter, why waste your time writing an implementation that no-one should use for anything practical anyhow, due to its abysmal performance?
[0] https://news.ycombinator.com/item?id=26296339
My question is: why is Rust performance contingent on a C malloc?
Because Rust switched to “system” allocators way back for compatibility with, well, the system, as well as introspection / perf tooling, to lower the size of basic programs, and to lower maintenance.
It used to use jemalloc, but that took a lot of space in even the most basic binary and because jemalloc is not available everywhere it still had to deal with system allocators anyway.
...it only matters if the threads allocate/free with such a high frequency that they run into contention, the C stdlib allocator is a shared resource and user code really shouldn't assume that the allocators fixes their poor design decisions for multithreaded code.
Edit: To be more precise, an engineering sample was spotted.
If other allocators are able to handle a situation perfectly well, even a general-purpose allocator like the one in glibc, that suggests that musl's is deficient.
A smaller code base also means a smaller attack surface and fewer potential bugs.
The question remains: why does the Rust ecosystem depend so much on a system component they ultimately have no control over?
The new one was drafted here: https://github.com/richfelker/mallocng-draft
Blames it all on app code like Wayland
> “the new ng allocator in MUSL doesn’t make a dime of a difference”
Optimizing for size & stdlib code simplicity is probably not the best fit for your application server! Container size has always struck me as such a Goodhart's Law issue (and worse, already a bad measure as it measures only a very brief part of the software lifecycle). Goodhart's Law:
> When a measure becomes a target, it ceases to be a good measure
This particular musl/Alpine footgun can be worked around. It's not particularly hard to install and use another allocator on Alpine or anywhere really. Ruby folks in particular seem to have a lot of lore around jemalloc, with various versions preferences and MALLOC_CONFIGs on top of that. But in general I continue to feel like Alpine base images bring in quite an X factor, even if you knowingly adjust the allocator: the prevalence of Alpine in container images feels unfortunate & eccentric.
Going distorless is always an option. A little too radical for my tastes though usually. I think of musl+busybox+ipkg as the distinguishing aspects of Alpine, so on that basis I'm excited to see the recent huge strides by uutil, the rust rewrite of gnu coreutils focused on compatibility. While offering a BusyBox-like all-in-one binary convenience! It should make a nice compact coreutils for containers! The recent 0.2 has competitive performance which is awesome to see. https://www.phoronix.com/news/Rust-Coreutils-0.2
Once the container OS forks and runs your binary, I'm curious why does it matter? Is it because people run interpreted code (like Python or Node) and use runtimes that link musl libc? If you deploy JVM or Go apps this will probably not be a factor.
Go is a rare counter example, which ignores the system allocator & bundles its own.
Its not so long ago that the GNU libc had a very similar allocator too, and thats why you'd pop Hoard in your LD_PRELOAD or whatever.
Not every program is multi-threaded, and so not every program would experience thread contention.
The third exception is programs that should be multithreaded but aren't because they are written in languages where adding more threads is disproportionately hard (C, C++) or impossible (Python, Ruby, etc.).
the difficulty totally lies in the design... actually using parallelism where it matters. - tons of multi-threaded programs are just single-thread with a lot of 'scheduler' spliced into this one thread -_-
Unless I'm writing Java, I avoid multithreading whenever possible. I hear it's also nice in Go.
Rust is very much best in class here.
In terms of effort or expense, making any C or C++ program multithreaded is at least an order of magnitude harder/more expensive, even when designed for it from the beginning, so lots of programs aren't multithreaded that could be.
If you care about efficiency of a multi-threaded app you should use jemalloc (sadly no longer maintained but still works well), mi-malloc or tcmalloc.
In contrast mimalloc, a similarly minimalistic allocator has a per-thread heap, which each thread owning the memory it allocates, and cross-thread free's are handled in a deferred manner.
This works very well with Rust's ownership system, where objects rarely move between threads.
Internally, both allocators use size-class based allocation, into predefined chunks, with the key difference being that musl uses bitmaps and mimalloc uses free lists to keep track of memory.
Musl could be fixed, it they switch from a single thread model, to a per-thread heap as well.
mimalloc has about 10kloc, while (assuming I'm looking in the right place) the new musl allocator has 891 and the old musl allocator has 518 lines of code. I wouldn't call an order of magnitude difference in line count 'similar'.
But I think you can tweak musl to perform well, and musl is closer to the spec than glibc so I would rather use it; even if its slower in the default case for multithreaded programmes.
You can not, its allocator does thread safety via a big lock and that’s that.
> musl is closer to the spec than glibc
Is it?
> even if its slower in the default case for multithreaded programmes.
That’s far from the only situation where it’s slower though.
Swapping out jemalloc for the system allocator will net you huge performance wins if you link against musl, but you’ll still have issues with multithreading performance due to the slower implementations of necessary helpers.
Performance in edge-cases by far isn't the only metric that matters for allocators.
This has been my bane at various open source projects, because at some point somebody will say that all currently supported Linux distributions should be supported by a project. This works as a rule of thumb, except for RHEL, which has some truly ancient GCC versions provided in the "extended support" OS versions.
* The oldest supported versions in "production" is RHEL 8, and in "extended support" is RHEL 7. * RHEL 8 (released 2019) provides gcc 8 (released May 2018). RHEL 7 (released 2014) provides gcc 4.8 (released March 2013). * gcc 8 supports C++17, but not C++20. gcc 4.8 supports most of C++11 (some C++ stdlib implementations weren't added until later), but doesn't support C++14.
So the well-meaning cutoff of "support the compiler provided by supported major OS versions" becomes a royal pain, since it would mean avoiding useful functionality in C++17 until mid-2024 (when RHEL 7 went from "production" to "extended support") or mid-2028 (when RHEL 7 "extended support" will end). It's not as bad at the moment, since C++20 and C++23 were relatively minor changes, but C++26 is shaping up to be a pretty useful change, and that wouldn't be usable until around 2035 when RHEL 10 leaves "production".
I wouldn't mind it as much if RHEL named the support something sensible. By the end of a "production" window, the OS is still absolutely suitable as a deployment platform for existing software. Unlike other "production" OS versions, though, it is no longer reasonable as a target for new development at that point.
Ask for payment for extended support as well.
> The mallocng allocator was designed to favor very low memory overhead, low worst-case fragmentation cost, and strong hardening over performance. This is because it's much easier and safer to opt in to using a performance-oriented allocator for the few applications that are doing ridiculous things with malloc to make it a performance bottleneck than to opt out of trading safety for performance in every basic system utility that doesn't hammer malloc.
[1]https://www.openwall.com/lists/musl/2025/09/05/3
[1] https://github.com/VictoriaMetrics/VictoriaLogs/issues/517
EDIT: Ah, they were mentioned, of course.
On some malloc replacements, telescope -a gopher/gemini client- used to be a bit crashy until I used jemalloc on some platforms (with LD_PRELOAD).
Also, the performance rendering pages with tons of links improved a lot.