Could the Xz Backdoor Been Detected with Better Git/deb Packaging Practices?
Key topics
The XZ backdoor incident raises questions about the security of open source software and the need for better Git and Debian packaging practices, with commenters debating the trustworthiness of open source software and the importance of transparency and accountability.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
N/A
Peak period
72
0-12h
Avg / period
15.4
Based on 108 loaded comments
Key moments
- 01Story posted
Oct 19, 2025 at 1:38 PM EDT
3 months ago
Step 01 - 02First comment
Oct 19, 2025 at 1:38 PM EDT
0s after posting
Step 02 - 03Peak activity
72 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 27, 2025 at 6:50 AM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
you can of course come up with ways it could have been caught, but the code doesn't stand out as abnormal in context. that's all that really matters, unless your build system is already rigid enough to prevent it, and has no exploitable flaws you don't know about.
finding a technical overview is annoyingly tricky, given all the non-technical blogspam after it, but e.g. https://securelist.com/xz-backdoor-story-part-1/112354/ looks pretty good from a skim.
There's no good reason to have opaque, non generated data in the repository and it should certainly be a red flag going forwards.
should not have been in any commit, which is basically necessary to prevent this case, almost definitely not. it's normal, and requiring all data to be generated just means extremely complicated generators for precise trigger conditions... where you can still hide malicious data. you just have to obfuscate it further. which does raise the difficulty, which is a good thing, but does not make it impossible.
I completely agree that it's a good/best practice, but hard-requiring everywhere it has significant costs for all the (overwhelmingly more common) legitimate cases.
The xz exploit depended on the absence of that explanation but accepting that it was necessary for unstated reasons.
Whereas it's entirely reasonable to have a test that says something like: "simulate an error where the header is corrupted with early nulls for the decoding logic" or something - i.e. an explanation, and then a generator which flips the targeted bits to their values.
Sure: you _could_ try inserting an exploit, but now changes to the code have to also surface plausible data changes inline with the thing they claim is being tested.
I wouldn't even regard that as a lot of work: why would a test like that exist, if not because someone has an explanation for the thing they want to test?
Instead of committing blobs, why not commit documented code which generates those blobs? For example, have a script compress a bunch of bytes of well-known data, then have it manually corrupt the bytes belonging to file size in the archive header.
1. Build environments may not be adequately sandboxed. Some distributions are better than others (Gentoo being an example of a better approach). The idea is that the package specification specifies the full list of files to be downloaded initially into a sandboxed build environment, and scripts in that build environment when executed are not able to then access any network interfaces, filesystem locations outside the build environment, etc. Even within a build of a particular software package, more advanced sandboxing may segregate test suite resources from code that is built so that a compromise of the test suite can't impact built executables, or compromised documentation resources can't be accessed during build or eventual execution of the software.
2. The open source community as a whole (but ultimately in the hands of distribution package maintainers) are not being alerted to and apply caution for unverified high entropy in source repositories. Similar in concept to nothing-up-my-sleeve numbers.[1] Typical examples of unverified high entropy where a supply chain attack can hide payload: images, videos, archives, PDF documents etc in test suites or bundled with software as documentation and/or general resources (such as splash screens in software). It may also include IVs/example keys in code or code comments, s-boxes or similar matrices or arrays of high entropy data which may not be obvious to human reviewers how the entropy is low (such as a well known AES s-box) rather than high and potentially undifferentiated from attacker shellcode. Ideally when a package maintainer goes to commit a new package or package update, are they alerted to unexplained high entropy information that ends up in the build environment sandbox and required to justify why this is OK?
[1] https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number
This bends my brain a little. I get that they were written before git, but not before the advent of version control.
When you diff the `bash-5.3` tag against the `bash-5.3-rc2` tag, the set of changes is reduced by a ton. It's the same story with previous release commits (at least for as far as I care to go back)... there's a "next version" branch that gets tagged with alpha, beta, and rc releases, and then there's a release commit that's made on master with the changes from the "next version" branch, plus some additional changes.
Why do they do things this way? I have no idea, but it clearly seems to work for them.
(Independently importing release tarballs into VCS also worked better in the era of a dozen competing VCSes, without reliable export-import pipelines.)
My observations and questions were about the GNU bash git repo and how (and why) the bash maintainers do release branching and tagging. They were not about how the Debian folks handle their packaging.
Here are the headlines for a couple of fix commits:
It looks like discussion of the patches happens on the mailing list, which is easy to access from the page that brought you to the repo browser.If that quote's about keeping Debian packaging in source control, I don't really see much benefit for packages like coreutils and bash that generally Just Work(TM) because they're high-quality and well-tested. Sign what you package up so you can detect tampering, but I don't see you really needing anything else.
The packages uploaded in Debian are what matters and they are versioned.
The easiest way to verify that is by using a reproducible automated pipeline, as that moves the problem to "were the packaging files tampered with".
How do you verify the packaging files? By making them auditable by putting them in a git repository, and for example having the packager sign each commit. If a suspicious commit slips in, it'll be immediately obvious to anyone looking at the logs.
Conversely, this is also an attack surface. It can be easy to just hit "accept" on automated pipeline updates.
New source for bash? Seems legit ... and the source built ... "yeah, ok."
Distros do not need to update packages on each and every upstream commit.
>we can only trust open source software. There is no way to audit closed source software
The ability to audit software is not sufficient, nor neccessary for it to be trustworthy.
>systems of a closed source vendor was compromised, like Crowdstrike some weeks ago, we can’t audit anything
You can't audit open source vendors either.
Debian is the OS, and the OS vendor should decide and modify the components it uses as a foundation to create the OS as he desires. That's what I am choosing Debian for and not some other OS.
> You can't audit open source vendors either.
What defines open source, is that you can request the sources for audit and modification, so I think this statement is just untrue.
>you can request the sources
Organizarions that open source software can have closed source infrastructure that you can't request.
> Organizarions that open source software can have closed source infrastructure that you can't request.
Which can't be a source for the program binaries, so you can still audit them, you just can't rely on e.g. their proprietary test suite.
You can audit a lot of Debian's infrastructure - their build systems are a lot more transparent than the overwhelming majority of software vendors (which is not to say there isn't still room for improvement). You can also skip their prebuilt packages and build everything on your own systems, which of course you then have the ability to audit.
IIRC, this dependency isn't in upstream OpenSSH.
However, OpenSSH is open source with a non-restrictive license and as such, distributors (including Linux distributions) can modify it and distribute modified copies. Additionally, OpenSSH has a project goal that "Since telnet and rlogin are insecure, all operating systems should ship with support for the SSH protocol included." which encourages OS projects to include their software, with whatever modifications are (or are deemed) necessary.
Debian frequently modifies software it packages, often for better overall integration; ocassionally with negative security consequences. Adding something to OpenSSH to work better with systemd is in both categories, I guess.
It was a pure fluke that it got discovered _this early_.
They weren't the ones to find the cause first (that's the person who took a deeper look due to the slowness), but the red flags had been raised.
The error was related to the use of the frame pointer. Optimised code does not use RBP as the frame pointer, only using RSP for stack addresses. The XZ backdoor code assumed that the stack used this layout. The RedHat regression tests use debug builds that do use the frame pointer. The result was the backdoor code writing below the bottom of the stack.
I suspect also that Valgrind is unique in finding issues like this. Other tools do not check all memory accesses before main. Valgrind loads and runs the test binary from the very beginning and thus it detected errors in the ifunc code used by XZ that executed very early on during ld.so loading and symbol resolution.
Possibly for any number of reasons. A sole maintainer with a bit too little capacity to keep up the development. A central role as a dependency for crucial packages in a couple of key distros.
What would be the connection between the backdoor (or indeed any supply chain security) and any design details of the xz file format? How would the backdoor have been avoided if the archive format were different?
Frankly, tarballs are an embarrassing relic, and it's not the turbonormies that insist they're still fit for purpose. They don't know any better, they'll do what people like you tell them to do.
But should we trust it? No!! That's why we're here!
I'm not satisfied with the author's double-standard-conclusion. Trust, but verify does not have some kind of hall pass for OSS "because open-source is clearly better."
Trust, but verify is independent of the license the coders choose.
[1]: https://opensource.org/osd
And certainly a condition of the "verify" step?
With closed-source software, you can (almost) _only_ trust.
The issue for xz was that the build system was not hermetic (and sufficiently audited).
Hermitic build environments that can’t fetch random assets are a pain to maintain in this era, but are pretty crucial in stopping an attack of this kind. The other way is reproducible binaries, which is also very difficult.
EDIT: Well either I responded to the wrong comment or this comment was entirely changed. I was replying to a comment that said. “The issue was that people used pre-built binaries” which is materially different to what the parent now says, though they rhyme.
(The ostensibly autotools-built files in the tarball did not correspond to the source repository, admittedly, but that’s another question, and I’m of two minds about that one. I know that’s not a popular take, but I believe Autotools has a point with its approach to source distributions.)
Furthermore, that’s not quite true[1]. The differences only concerned the exploit’s (very small) bootstrapper and were isolated to the generated configure script and one of the (non-XZ-specific) M4 scripts that participated in its generation, none of which are in the XZ Git repo to begin with—both are put there, and are supposed to be put there, by (one of the tools invoked by) autoreconf when building the release tarball. By contrast, the actual exploit binary that bootstrapper injected was inside the Git repo all along, disguised as a binary test input (as I’ve said above) and identical to the one in the tarball.
To detect the difference, the distro maintainers would have needed to detect the difference between the M4 file in the XZ release tarball and its supposed originals in one of the Autotools repos. Even then, the attacker could instead have shipped an unmodified M4 script but a configure script built with the malicious one. Then the maintainers would have needed to run autoreconf and note that the resulting configure script differed from the one shipped in the tarball. Which would have caused a ton of false positives, because that means using the exact versions of Autotools parts as the upstream maintainer. Unconditionally autoreconfing things would be better, but risk breakage because the backwards compatibility story in Autotools has historically not been good, because they’re not supposed to be used that way.
(Couldn’t you just check in the generated files and run autoreconf in a commit hook? You could. Glibc does that. I once tried to backport some patches—that included changes to configure.ac—to an old version of it. It sucked, because the actual generated configure file was the result of several merges and such and thus didn’t correspond to the output of autoreconf from any Autotools install in existence.)
It’s easy to dismiss this as autotools being horrible. I don’t believe it is; I believe Autotools have a point. By putting things in the release tarball that aren’t in the maintainer’s source code (meaning, nowadays, the project’s repo, but that wasn’t necessarily the case for a lot of their existence), they ensure that the source tarball can be built with the absolute bare minimum of tools: a POSIX shell with a minimal complement of utilities, the C compiler, and a POSIX make. The maintainer can introduce further dependencies, but that’s on them.
Compare this with for example CMake, which technically will generate a Makefile for you, but you can’t ship it to anybody unless they have the exact same CMake version as you, because that Makefile will turn around and invoke CMake some more. Similarly, you can’t build a Meson project without having the correct Python environment to run Meson and the build system’s Python code, just having make or ninja is not enough. And so on.
This is why I’m saying I’m of two minds about this (bootstrapper) part of the backdoor. We see the downsides of the Autotools approach in the XZ backdoor, but in the normal case I would much rather build a release of an Autotools-based project than a CMake- or Meson-based one. I can’t even say that the problem is the generated configure script being essentially an uninspectable binary, because the M4 file that generated it in XZ wasn’t, and the change was very subtle. The best I can imagine here is maintaining two branches of the source tree, a clean one and a release one, where each release commit is notionally a merge of the previous release commit and the current clean commit, and the release tarball is identical to the release commit’s tree (I think the uacme project does something like that?); but that still feels insufficient.
[1] https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78b...
Even I’m guilty of focusing on the technical aspects, but the truth is that the social campaign was significantly more difficult to understand, unpick and is so much more problematic.
We can have all the defences we want in the world, but all it takes is to oust a handful of individuals or in this case, just one: or bribe them or blackmail them- then nobody is going to be reviewing because everybody believes that it has been reviewed.
I mean, we all just accept whatever the project believes is normal right?
It’s not like we’re pushing our ideas of transparency on the projects… and even if we were, it’s not like we are reviewing them either they will have their own reviewers and the only people left are package maintainers who are arguably more dangerous.
There is an existential nihilism that I’ve just been faced with when it comes to security.
unless projects become easier to reproduce and we have multiple parties involved in auditing then I’m a bit concerned.
Not in this thread we don’t? The whole thing has been about the fact that it wasn’t easy for a distro maintainer to detect the suspicious code even if they looked. Whether anyone actually does look is a worthy question, but it’s not orthogonal to making the process of looking not suck.
Of course, if we trust the developer to put software on our machine with no intermediaries, the whole thing goes out the window. Don’t do that[1]. (Oh hi Flatpak, Snap. Please go away. Also hi NPM, Go, Cargo, PyPI; no, being a “modern programming language” is not an excuse.)
[1] https://drewdevault.com/2021/09/27/Let-distros-do-their-job....
Yes and no. "make -f Makefile.cvs" has been a supported workflow for decades. It's not what the "build from source" instructions will tell you to do, but those instructions are aimed primarily at end users building from source who may not have M4 etc. installed; developers are expected to use the Makefile.cvs workflow and I don't think it would be too unreasonable to expect dedicated distro packagers/build systems (as distinct from individual end users building for their own systems) to do the same.
However, for the sake of devil's advocacy, I do also want to point out that the first thing a lot of people used to do after downloading and extracting a source tarball was to run "./configure" without even looking at what it is they were executing - even people who (rightly) hate the "curl | bash" combo. You could be running anything.
Being able to verify what it is you're running is vitally important, but in the end it only makes a difference if people take the time to do so. (And running "./configure --help" doesn't count.)
That's true unless I audit every single line, out of potentially millions, in the source of a program I intend to run. If I'm going to do that, then I could audit the ./configure script as well.
Do you "understand security"? There's a grain of truth to what you're saying, but not more than that. The crux of this problem is with running untrusted binaries (or unreviewed source code) vs. installing something from a trusted repository.
The majority of people either don't know or don't care to review the source code. They simply run the commands displayed on the website, and whether you ask them to "curl | bash" or "wget && apt install ./some.deb" won't make any difference to their security.
Even if you do a "proper trust chain" and digitally sign your packages, that key is served through the same channel as the installation instructions and thus requires trust on first use, just like "curl | bash".
Unfortunately publishing every piece of software through every operating system's default repository isn't very realistic. Someone, somewhere is going to have to install the binary manually and "curl | bash" is as good of a method for doing that as any.
This applies to "curl | bash", "download an exe and run it", and everything in between equally. If a malicious binary wants to cover up its tracks it can just delete itself and disappear just like "curl | bash" would.
Feel free to educate users about the importance of installing software from trusted repositories whenever possible but demonizing "curl | bash" like it's somehow uniquely terrible is just silly and misses the point completely.
--
Automatic downloading of dependencies can be done in a sane way but not sane without significant effort. eg: building Debian packages can install other pre-packaged dependencies. In theory other packages are built the same way.
Where this becomes an issue specifically is where language-specific mechanisms reach-out and just install dependencies. To be fair, this has happened for a long time (I'm looking at you, CPAN) and does provide a lot of potential utility to any given developer.
What might be better than advocating for "not doing this at all" is "fixing the pattern." There are probably better ideas than this but I'll start here:
1) make repositories include different levels of software by default. core vs community at an absolute minimum. Maybe add some levels like, "vetted versions" or "gist" to convey information about quality or security.
2) make it easy to have locally vetted pools to choose from. eg: Artifactory makes it easy to locally cache upstream software repos which is a good start. Making a vetting process out of that would be ... useful but cumbersome.
At the end of the day, we are always running someone else's code.
Unless the dependencies are properly pinned and hashed.
I think your phrasing is a bit overbroad. There's nothing fundamentally broken with the build system fetching resources; what's broken is not verifying what it's fetching. Audit the package beforehand and have your build system verify its integrity after downloading, and you're fine.
nobody verifies all packages that are automatically downloaded all the time, unless there is a problem. We got lucky, that time.
Setup a mirror of all the repositories you care about; then configure the network so your build system can reach the mirrors; but not the general Internet.
Of course, once you do this, you eventually create a cron job on mirrors to blindly update themselves...
This setup does at least prevent an old version of a dependency from silently changing, so projects that pin their dependencies can be confident in that. But even in those cases, you end up with a periodic "update all dependencies" ticket, that just blindly takes the new version.
the exploit used the only solution for this problem: binary test payload. there's no other way to do it.
maybe including the source to those versions and all the build stuff to then create them programmatically... or maybe even a second repo that generates signed payloads etc... but its all overkill and would have failed human attention as the attack proved to begin with.
Ideally a test env and a build env should be entirely isolated should the test code some how modify the source. Which in this case it did.
We have no clue who “Jia Tan” is, a name certain to be a pseudonym. Nobody has seen his face. He never provided ID to a HR department. He pays no taxes to a government that links these transactions to him. There is no way to hold his feet to the fire for misdeeds.
The open source ecosystem of tools and libraries is built by hundreds of thousands of contributors, most of whom are identified by nothing more than an email. Just a string of characters. For all we know, they’re hyper-intelligent aliens subtly corrupting our computer systems, preparing the planet for invasion! I mean… that’s facetious, but seriously… how would we know if it was or wasn’t the case!? We can’t!
We have a scenario where the only protection is peer review: but we’ve seen that fail over and over systematically. Glaring errors get published in science journals all of the time. Not just the XZ attack but also Heartbleed - an innocent error - occurred because of a lack of adequate peer review.
I could waffle on about the psychology of “ownership” and how it mixes badly with anonymity and outside input, but I don’t want this to turn into war and peace.
The point is that the fundamental issue from the “outside” looking in as a potential user is that things go wrong and then the perpetrators can’t be punished so there is virtually no disincentive to try again and again.
Jia Tan is almost certainly a state-sponsored attacker. A paid professional, whose job appears to be to infect open source with back doors. The XZ attack was very much a slow burn, a part time effort. If he’s a full time employee, how may more irons did he have on the fire! Dozens? Hundreds!?
What about his colleagues? Certainly he’s not the one and only such hacker! What about other countries doing the same with their own staff of hackers?
The popular thinking has been that “Microsoft bad, open source good”, but imagine Jia Tan trying to pull something like this off with the source of Windows Server! He’d have to get employed, work in a cubicle farm, and then if caught in the act, evade arrest!
That’s a scary difference.
You're making a distinction not between open source and proprietary software but rather between hobbyist and corporate software.
There are open source projects made by companies with no external contributions allowed (sqlite sorta, most of google and amazon's oss projects in practice etc)
There are proprietary software downloads with no name attached, like practically every keygen, game crack, many indie games posted for free download on forums or 4chan, etc etc.
> hobbyist and corporate software.
OpenSSL was maintained by like two guys in their spare time, and underpinned trillions of dollars worth of systems and secure transfers.
Would you categorise that as “hobbyist”?
The semantics matter, so I’m going to agree with you and clarify that my concern is with the risks associated with “effectively anonymous contributors allowed” software, where personal consequences for bad actors are near zero.
On the Venn diagram of software licenses and source accessibility, this “especially risky” category significantly overlaps FLOSS and has little overlap with most proprietary software products.
I personally had no bias or aversion to FLOSS software for either personal or professional use, but in all seriousness the XZ attack after the Heartbleed vulnerability made me reconsider my priors.
You pay for nginx plus? Oops, that uses openssl. F5 load balancers since you want to get even more proprietary and expensive? Some of those used OpenSSL too.
Microsoft IIS? Lemme tell you about the history of absolutely bafflingly bad vulnerabilities in that software, far worse than open source nginx ever had.
Effectively anonymous contributions are not what caused heartbleed, they're not what caused the vast majority of breaches and hacks into proprietary software companies nor the vast majority of vulnerabilities.
Bad code is what causes these bugs, and as far as I can tell, the easiest recipe to bad vulnerable code is to have a manager repeatedly tell an engineer "deliver this by friday or you're fired", which happens much less in free software projects.
I'm just trying to get a coherent idea of what you think the right thing to do here is.
How do I stay secure? What OS do I use that doesn't include a ton of open source components and reviews every line of code that goes into it? As far as I can tell, this has already excluded ChromeOS (based on open source packages, many imported without reading all the LoC), macOS (even worse, and an even greater history of vulnerabilities)... I guess windows is the best by this standard? But statistically it's also the most vulnerable, so it doesn't seem like this standard has gotten us to a logical conclusion, does it?
Versus… a random email offers to help, someone says “sure!”, and… that’s it. That’s the entire hurdle.
Google did discover a Chinese hacker working for them on the payroll. That kind of thing does occur, but it’s rare.
It’s massively harder and more risky.
There's no knowing how many backdoors were added by small network companies or contractors. But there's rarely accountability when it happens because the company would rather cover it up, or just not ask too many questions about that weird bug
The discovery of the hack is rare, sure. Once a decade kind of thing.
The implication is that Jia Tan is a professional, and XZ was one of many irons on the fire.
Don’t be like Trump!
Don’t confuse positive tests with cases!
Jia Tan surely had many other attacks going.
Surely he’s not the only one.
Famously, there are two kinds of large organisations: those that have been hacked, and those that don’t yet know they’ve been hacked.
The open source community was the latter.
Now they’re the former.
Some of you all are still playing catch up.
It's safe to assume pretty much all the firmware you're running is vulnerable. It doesn't matter though, because you cannot find out.
The attackers can. You can't. And that's why we still have botnets.
https://randomascii.wordpress.com/category/uiforetw-2/
Also, Windows is just suspicious in general. It's slow, everything makes network requests. Finding malware in Windows is a needle in a haystack. For some perspectives, Its all malware.
A random person or group nobody has ever seen or knows submitted a backdoor.
2. Some people may want to remain pseudonymous for legitimate reasons.
The developers (at least important ones) could register with Debian project, just like they would with a company: submit identity and government documents, proof of physical address, bank account, credit card information, IdP account, .. It would operate like an organization.
The lead developers could meet and know each other through regular meetings. Kind of web of trust with in person verification. There are already online meetings in some projects.
The XZ backdoor was possible because people stick generated code (autoconf's output), which is totally impractical to audit, into the source tarballs.
In nixpkgs, all you have to do is add `autoreconfHook` to the `nativeBuildInputs` and all that stuff gets regenerated at build time. Sadly this is not the default behavior yet.
2 more comments available on Hacker News