Docker Was Too Slow, So We Replaced It: Nix in Production [video]
Posted3 months agoActive3 months ago
youtube.comTechstoryHigh profile
skepticalmixed
Debate
80/100
NixDockerContainerizationDevops
Key topics
Nix
Docker
Containerization
Devops
A company replaced Docker with Nix for production, sparking discussion on the trade-offs and alternatives to containerization technologies.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
41
0-12h
Avg / period
9.3
Comment distribution56 data points
Loading chart...
Based on 56 loaded comments
Key moments
- 01Story posted
Sep 27, 2025 at 2:56 PM EDT
3 months ago
Step 01 - 02First comment
Sep 27, 2025 at 5:18 PM EDT
2h after posting
Step 02 - 03Peak activity
41 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 4, 2025 at 7:01 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45398468Type: storyLast synced: 11/20/2025, 2:46:44 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Docker was made because Linux sucks at running computer programs. Which is a very silly thing to be bad at. But here we are.
What has happened in more recent years is that CMake sucks ass so people have been abusing Docker and now Nix as build system. Blech.
The speaker does get it right at the end. A Bazel/Buck2 type solution is correct. An actual build system. They're abusing Nix and adding more layers to provide better caching. Sure, I guess.
If you step back and look at the final output of what needs to be produced and deployed it's not all that complicated. Especially if you get rid of the super complicated package dependency graph resolution step and simply vendor the versions of the packages you want. Which everyone should do, and a company like Anthropic should definitely do.
> Depend on glibc, copypaste all your other shared lib dependencies and plop them in RPATH. Pretend `/lib` is locked at initial install. Remove `/usr/lib` from the path and include everything.
You are not describing relocatable builds at all. You are describing... well, it kinda sounds like how Nix handles RPATH.
There are certainly things within Nix that I like. But on the whole I think it's approximately two orders of magnitude more complicated than is necessary to efficiently build and deploy software.
> But on the whole I think it's approximately two orders of magnitude more complicated than is necessary to efficiently build and deploy software.
This might be true, if you dramatically constrain the problem space to something like "only build and deploy static Go binaries". If you have that luxury, by all means, simplify your life!
But in the general case, it is an inherently complex problem space, and tools that attempt to rise to the challenge, like Bazel or Nix, will be lot more complex than a Dockerfile.
I use Buck2 in my day job. For almost all projects its an order of magnitude simpler than CMake. It's got a ton of cruft to support a decades worth of targets that were made badly made. But the overall shape is actually pretty darn good. I think Starlark (and NixLang) are huge mistakes. Thou shalt not mix data and logic. Thou shalt not ever ever ever force users to use a language that doesn't have a great debugger.
Build systems aren't actually that complicated. It's usually super duper trivial for me to write down the EXACT command I want to execute. It's "just" a matter of inducing a wobbly rube goldberg machine that can't be debugged to invoke the god damn command I know I want. Grumble grumble.
I'm not sure that's why Docker was made.
I'm pretty sure Linux is not-suck at running programs; it does run quite a lot of them. Might even be most of them? All those servers and phones and stuff.
What Docker has done is kinda right. Programs should include all of their dependencies! But it sucks that you have to fight really hard and go out of your way to work around Linux's original choices.
Now you may be thinking wait a second Linux was right! It's better to have a single copy of shared libraries so they can be securely updated just once instead of forcing every program to release a new update! Theoretically sure. But since everyone uses Docker you have to (slowly and painfully) rebuild the image and so that supposed benefit is moot. Womp womp.
Additional reading if you are so inclined: https://jangafx.com/insights/linux-binary-compatibility
One thing I see is that folk who are making software for Linux target Distribution rather than Linux generic. It's because of the SO "problem".
It's a benefit of the tight coupling of Windows or Mac (which I don't use).
Full agree that security updates in the Docker world is like the problem from static builds.
Disclosure: I ship some stuff on Linux; for the problems we do static & docker. The demand side also seems to favour docker. I also prefer the docker method; for the compatibility reasons.
I think I disagree on Windows being tightly coupled. Windows simply has a barren global library path. Programs merely included their extra DLLs adjacent to the exe. Very simple and reliable.
Linux has added complexity in that glibc sucks and is horribly architected. GCC/Clang are bad and depend on the local environment waaaay too much. Linux ecosystem is very much not built to support sharing cross compiled binaries. It’s so painful. :(
Nowadays SSDs with decent random IO are quite cheap to the point where even low-end hosting has them, and getting spinning disks is a choice reserved for large media files. On the consumer side, we are below a hundred dollars for one TB, so the storage savings are not very relevant.
But if you go back to when Linux was designed, storage was painfully expensive, and potentially saving even a few hundred megabytes was pretty good.
But I do agree that shared libraries are generally a bad idea and something that will most likely create problems at some point. Self-contained software makes a lot more sense generally speaking. And I definitely think that Docker is a dumb "solution" for software distribution but the problem really starts with devs using way too many dependencies.
If you really, truly, hate that then just statically link stuff. That's always been allowed.
Statically linked Linux binaries will run for decades. The idea that Linux binaries rot while windows has some magic that makes them live is just made up.
"But what about glibc???" Yeah that's made up. Glibc doesn't break ABI like people think they do. If you look, the glibc ABI is incredibly stable across decades. SOME highly specific gnu-only extensions have been broken. You're not using those, it's a made up problem.
That's really not how frontier research works.
The packages they want are nightlies with lots of necessary fresh fixes that their team probably even contributed to, and waiting for Red Hat to include it in the next distro is completely out of the question.
There is a wealth of options between latest_master -> nightly -> ..... -> RedHat update.
And there's only a very small number of specific libraries that you'd even want to consider for nightly. Majority of repo should absolutely be pinned and infrequently updated.
There have been so many vendor supply chain exploits on HN in 2025 that I'd consider it borderline malpractice to use packages less than a month old in almost all cases. Certainly by default.
Docker containers may not be portable anyway when the CUDA version used in the container has to match the kernel driver and GPU firmware, etc.
Edit: see also the horrors that exist when you mix nvidia software versions: https://developer.nvidia.com/blog/cuda-c-compiler-updates-im...
Go look at the llama 3 whitepaper and look at how frequently their training jobs died and needed to be restarted. Quoting:
> During a 54-day snapshot period of pre-training, we experienced a total of 466 job interruptions. Of these, 47 were planned interruptions due to automated maintenance operations such as firmware upgrades or operator-initiated operations like configuration or dataset updates. The remaining 419 were unexpected interruptions, which are classified in Table 5. Approximately 78% of the unexpected interruptions are attributed to confirmed hardware issues, such as GPU or host component failures, or suspected hardware-related issues like silent data corruption and unplanned individual host maintenance events.
[edit: to be clear, this is not meant to be a dig on the meta training team. They probably know what they’re doing. Rather, it’s meant to give an idea of how immature the nvidia ecosystem was when they trained llama 3 in early 2024. This is the kind of failure rates you can expect if you opt into using the same outdated software they were forced to use at the time.]
The firmware and driver quality is not what people think it is. There’s also a lot of low-level software like NCCL and the toolkit that exacerbates issues in specific drivers and firmware versions. Grep for “workaround” in the NCCL source code and you’ll see some of them. It’s impossible to validate and test all permutations. It’s also worth mentioning that the drivers interact with a lot of other kernel subsystems. I’d point to HMM, the heterogeneous memory manager, which is hugely important for nvidia-uvm, which was only introduced in v6.1 and sees a lot of activity.
Or go look at the amount of patches being applied to mlx5. Not all of those patches get back ported into stable trees. God help you if your network stack uses an out of tree driver.
[1]: https://patchwork.ozlabs.org/project/ubuntu-kernel/patch/202...
To the point: I'd argue ineptitude is both more damning and accurate than stupidity in this particular case.
I'm sorry to come across as arrogant, but it's really just frustration, because being surrounded by this kind of cargo-culting "special sauce" talk, even from so-called principal engineers, is what drove me to burnout and out of the industry into the northwoods. Furthermore, you're completely wrong. There is no special sauce, you just didn't look at the list of ingredients. There never has been any special sauce.
NVIDIA builds their NGC base containers from open source scripts available on their gitlab instance: https://gitlab.com/nvidia/container-images/cuda
The build scripts for the base container are incredibly straightforward: they add the apt/yum repos and then install packages from that repo.
The pytorch containers are constructed atop these base containers. The specific pytorch commit they use in their NGC pytorch containers are directly linked in their release notes for the container: https://docs.nvidia.com/deeplearning/frameworks/pytorch-rele...
That is:
25.08: https://github.com/pytorch/pytorch/commit/5228986c395dc79f90...
25.06: https://github.com/pytorch/pytorch/commit/5228986c395dc79f90...
25.05: https://github.com/pytorch/pytorch/commit/5228986c395dc79f90...
25.04: https://github.com/pytorch/pytorch/commit/79aa17489c3fc5ed6d...
25.03: https://github.com/pytorch/pytorch/commit/7c8ec84dab7dc10d4e...
25.02: https://github.com/pytorch/pytorch/commit/6c54963f75e9dfdae3...
25.01: https://github.com/pytorch/pytorch/commit/ecf3bae40a6f2f0f3b...
24.12: https://github.com/pytorch/pytorch/commit/df5bbc09d191fff3bd...
24.11: https://github.com/pytorch/pytorch/commit/df5bbc09d191fff3bd...
24.10: https://github.com/pytorch/pytorch/commit/e000cf0ad980e5d140...
24.09: https://github.com/pytorch/pytorch/commit/b465a5843b92f33fe3...
24.08: https://github.com/pytorch/pytorch/commit/872d972e41596a9ac9...
24.07: https://github.com/pytorch/pytorch/commit/3bcc3cddb580bf0f0f...
24.06: https://github.com/pytorch/pytorch/commit/f70bd71a4883c4d624...
24.05: https://github.com/pytorch/pytorch/commit/07cecf4168503a5b3d...
24.04: https://github.com/pytorch/pytorch/commit/6ddf5cf85e3c27c596...
24.03: https://github.com/pytorch/pytorch/commit/40ec155e58ee1a1921...
24.02: https://github.com/pytorch/pytorch/commit/ebedce24ab578036dd...
24.01: https://github.com/pytorch/pytorch/commit/81ea7a489a85d6f6de...
23.12: https://github.com/pytorch/pytorch/commit/81ea7a489a85d6f6de...
23.11: https://github.com/pytorch/pytorch/commit/6a974bec5d779ec10f...
23.10: https://github.com/pytorch/pytorch/commit/32f93b1c689954aa55...
23.09: https://github.com/pytorch/pytorch/commit/32f93b1c689954aa55...
23.08: https://github.com/pytorch/pytorch/commit/29c30b1db8129b5716...
Do I need to keep going? Every single one of these commits is on pytorch/pytorch@main. So when you say:
> For a while the only pytorch code that worked on newly released hopper GPUs we had was the Nvidia ngc container, not Pytorch nightly.
That's provably false. Unless you're suggesting that upstream pytorch continually rebased (eg: force pushed, breaking the worktree of every pytorch developer) atop unmerged code from nvidia, the commit ishes would not match. Meaning all of these commits were merged into pytorch/pytorch@main, and were available in pytorch nightlies, prior to the release of those NGC pytorch containers. No secret sauce, no man behind the curtain, just pure cargo culting and superstition.
Address the behavior, not the people.
Having said that, my weekend project was "upgrading" my RSS reader to run HA on Kubernetes.
Once you have invested in understanding the Clifford algebra, your whole classical electrodynamics turns from 30 equations into two.
Once you have invested in writing a Fortran compiler, you can write numerical computations much easier and shorter than in assembly.
Once you have invested in learning Nix, your whole elaborate build infra, plus chef / puppet / saltstack suddenly shrinks to a few declarations.
Etc.
Your analogy breaks down with Nix, since learning and using it is a hostile experience, unlike (I assume) your other examples.
I have been using NixOS for about 5 years now on several machines, and I still don't know what I'm doing. Troubleshooting errors and implementing features is like climbing a mountain in the dark.
The language syntax is alien. Most functionality is unintuitive. The errors are cryptic. The documentation ranges from poor to nonexistent. It tries to replace every package manager in existence. The community is unwelcoming.
Guix addresses some of these issues, and at least it uses a sane language, but the momentum is, unfortunately, largely with Nix.
Nix pioneered many important concepts that most operating systems and package managers should have. But the implementation and execution of those concepts leaves a lot to be desired.
So, sure, if you and your team have patience to deal with all of its shortcomings, I can see how it can be useful. But personally, I would never propose using it in a professional setting, and would rather use established and more "complex" tooling most engineers are familiar with.
The only anti-intellectualism is not accepting that every technology has tradeoffs.
Scales wonderfully, fine grained permissions and configuration are exactly how you'd hope coming from systemd services. I appreciate it leverages various linux-isms like btrfs snapshots for faster read only or ephemeral containers.
People still by large have this weird assumption that you can only do OS containers with nspawn, never too sure where that idea came from.
Or he's using NixOS as the image OS and using nixos-containers (which use systemd-nspawn)
Their docker images were 11-35GB. Using the nix dockerTools approach would have resulted 100-300MB layers. These also may not even cache well between tags, though that's my intuition not knowledge. Especially if that's true, it wouldn't have improved the overall pull time issues they were having, which was 70-210s or image pull time on many new builds.
In their case they added a sidecar container which was actually an init container, which runs before the primary container of the pod runs. They did utilize root privileges to perform things like bind mounting of nix store paths into the running container which made it possible for the container to run software provided in the /nix/store available from those bind mounts. This also meant both the Kubernetes hosts and containers did not require the nix daemon, the nix-sidecar running within the pod orchestrated pulling derivations , binding them, and running garbage collection at low priority in the background to ensure host SSDs don't run out of storage, while still allowing referenced derivations in the cluster to persist, improving sync time where the SSD may already contain all necessary derivations for a new pod startup.
EESSI (https://eessi.io) has taken this model further by using CVMFS, Gentoo Prefix (https://prefix.gentoo.org), and EasyBuild to create full HPC environments for various architectures.
CVMFS also has a docker driver to allow only used parts of a container image to be fetched on demand, which is very good for cases in which only a small part of a fat image is used in a job. EESSI has some documentation about it here: https://www.eessi.io/docs/tutorial/containers/