Could the Xz Backdoor Been Detected with Better Git/deb Packaging Practices?

Posted3 months agoActive2 months ago

ottoke

120 points

110 comments

optimizedbyotto.comTechstoryHigh profile

heatednegative

Debate

80/100

Open Source SecuritySupply Chain AttacksSoftware Development Practices

Key topics

Open Source Security

Supply Chain Attacks

Software Development Practices

The XZ backdoor incident raises questions about the security of open source software and the need for better Git and Debian packaging practices, with commenters debating the trustworthiness of open source software and the importance of transparency and accountability.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

N/A

Peak period

0-12h

Avg / period

15.4

Comment distribution108 data points

Loading chart...

Based on 108 loaded comments

Key moments

01Story posted
Oct 19, 2025 at 1:38 PM EDT
3 months ago
Step 01
02First comment
Oct 19, 2025 at 1:38 PM EDT
0s after posting
Step 02
03Peak activity
72 comments in 0-12h
Hottest window of the conversation
Step 03
04Latest activity
Oct 27, 2025 at 6:50 AM EDT
2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (110 comments)

Showing 108 comments of 110

ottokeAuthor

3 months ago

2 replies

How did the changes in the binary test files tests/files/bad-3-corrupt_lzma2.xz and tests/files/good-large_compressed.lzma, and the makefile change in m4/build-to-host.m4) manifest to the Debian maintainer? Was there a chance of noticing something odd?

Groxx

3 months ago

2 replies

mostly no, from my reading - it was a multi-stage chain of relatively normal looking things that added up to an exploit. helped by the tests involved using compressed data that wasn't human-readable.

you can of course come up with ways it could have been caught, but the code doesn't stand out as abnormal in context. that's all that really matters, unless your build system is already rigid enough to prevent it, and has no exploitable flaws you don't know about.

finding a technical overview is annoyingly tricky, given all the non-technical blogspam after it, but e.g. https://securelist.com/xz-backdoor-story-part-1/112354/ looks pretty good from a skim.

XorNot

3 months ago

2 replies

Compression algorithms are deterministic over fixed data though (possibly with some effort).

There's no good reason to have opaque, non generated data in the repository and it should certainly be a red flag going forwards.

Groxx

3 months ago

2 replies

committed files with carefully crafted bad data is extremely common for testing how your code handles invalid data, especially with regression tests. and lzma absolutely needs to test itself against bad, possibly-malicious data.

sanjams

3 months ago

1 reply

I agree, but perhaps OP is suggesting that the hand-crafted data can be generated in a more transparent way. For example, via a script/tool that itself can be reviewed.

Groxx

3 months ago

1 reply

could have, absolutely.

should not have been in any commit, which is basically necessary to prevent this case, almost definitely not. it's normal, and requiring all data to be generated just means extremely complicated generators for precise trigger conditions... where you can still hide malicious data. you just have to obfuscate it further. which does raise the difficulty, which is a good thing, but does not make it impossible.

I completely agree that it's a good/best practice, but hard-requiring everywhere it has significant costs for all the (overwhelmingly more common) legitimate cases.

XorNot

3 months ago

It would be reasonable for error case data though to be thoroughly explained, and it must be explainable since otherwise what are you testing and why does the test exist?

The xz exploit depended on the absence of that explanation but accepting that it was necessary for unstated reasons.

Whereas it's entirely reasonable to have a test that says something like: "simulate an error where the header is corrupted with early nulls for the decoding logic" or something - i.e. an explanation, and then a generator which flips the targeted bits to their values.

Sure: you _could_ try inserting an exploit, but now changes to the code have to also surface plausible data changes inline with the thing they claim is being tested.

I wouldn't even regard that as a lot of work: why would a test like that exist, if not because someone has an explanation for the thing they want to test?

crote

3 months ago

1 reply

Yes, but the carefully crafted bad data should be explainable.

Instead of committing blobs, why not commit documented code which generates those blobs? For example, have a script compress a bunch of bytes of well-known data, then have it manually corrupt the bytes belonging to file size in the archive header.

uecker

3 months ago

Maybe, but it is generally much more work to do than including a minimized test case which may already exist. But I would argue that binary blobs for testing are not the problem but the code that allows things from a binary blob to be executed during building and/or later at run-time.

secondcoming

3 months ago

There are tons of reasons to have hand-crafted data in a repository.

sanjams

3 months ago

1 reply

The article references a technical write-up: https://research.swtch.com/xz-script

Groxx

3 months ago

ah, yes, this is one I remember seeing early on! thank you! I couldn't find much past the blogspam this time :/

dhx

3 months ago

There's a few obvious gaps, seemingly still unsolved today:

1. Build environments may not be adequately sandboxed. Some distributions are better than others (Gentoo being an example of a better approach). The idea is that the package specification specifies the full list of files to be downloaded initially into a sandboxed build environment, and scripts in that build environment when executed are not able to then access any network interfaces, filesystem locations outside the build environment, etc. Even within a build of a particular software package, more advanced sandboxing may segregate test suite resources from code that is built so that a compromise of the test suite can't impact built executables, or compromised documentation resources can't be accessed during build or eventual execution of the software.

2. The open source community as a whole (but ultimately in the hands of distribution package maintainers) are not being alerted to and apply caution for unverified high entropy in source repositories. Similar in concept to nothing-up-my-sleeve numbers.[1] Typical examples of unverified high entropy where a supply chain attack can hide payload: images, videos, archives, PDF documents etc in test suites or bundled with software as documentation and/or general resources (such as splash screens in software). It may also include IVs/example keys in code or code comments, s-boxes or similar matrices or arrays of high entropy data which may not be obvious to human reviewers how the entropy is low (such as a well known AES s-box) rather than high and potentially undifferentiated from attacker shellcode. Ideally when a package maintainer goes to commit a new package or package update, are they alerted to unexplained high entropy information that ends up in the build environment sandbox and required to justify why this is OK?

[1] https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number

flerchin

3 months ago

4 replies

> As of today only 93% of all Debian source packages are tracked in git on Debian’s GitLab instance at salsa.debian.org. Some key packages such as Coreutils and Bash are not using version control at all

This bends my brain a little. I get that they were written before git, but not before the advent of version control.

simoncion

3 months ago

2 replies

> I get that they were written before git, but not before the advent of version control.

  git clone https://git.savannah.gnu.org/git/bash.git
  git clone https://git.savannah.gnu.org/git/coreutils.git

Plug the repo name into https://savannah.gnu.org/git/?group=<REPO_NAME> to get a link to browse the repo.

kryptiskt

3 months ago

2 replies

Look at the commit log in the bash repo. What good does it do if it notionally is version controlled if the commits look like this:

    2025-07-03 Bash-5.3 distribution sources and documentation bash-5.3 Chet Ramey 896 -103357/+174007

oivey

3 months ago

1 reply

Ahh yes, if only the commit message was better. That would have stopped the xz attack.

rcxdude

3 months ago

1 reply

It's not just the commit message, but the fact that it's a single commit with >100k lines changed (though, if that's just a merge commit, it might be not super unusual for that kind of workflow. Though big merge commits are a good place to hide things in git, given that they can introduce their own changes)

simoncion

3 months ago

1 reply

> It's not just the commit message, but the fact that it's a single commit with >100k lines changed

When you diff the `bash-5.3` tag against the `bash-5.3-rc2` tag, the set of changes is reduced by a ton. It's the same story with previous release commits (at least for as far as I care to go back)... there's a "next version" branch that gets tagged with alpha, beta, and rc releases, and then there's a release commit that's made on master with the changes from the "next version" branch, plus some additional changes.

Why do they do things this way? I have no idea, but it clearly seems to work for them.

yencabulator

3 months ago

1 reply

Because historically, what's in a release tarball is not what's in the repository. In many cases, the release tarball of an old C/autoconf project has "half-built". Debian has always worked from the release tarballs, and thus if you "just" import Debian packaging into a VCS, you don't necessarily track every upstream commit, just the releases.

(Independently importing release tarballs into VCS also worked better in the era of a dozen competing VCSes, without reliable export-import pipelines.)

simoncion

2 months ago

1 reply

I think you're either confused, or have attached your comment to the wrong parent?

My observations and questions were about the GNU bash git repo and how (and why) the bash maintainers do release branching and tagging. They were not about how the Debian folks handle their packaging.

yencabulator

2 months ago

Oh yeah. The thread before that talked about Debian packaging.

simoncion

3 months ago

That looks to be the headline for the public release commit. If you'd bothered to look around for a full sixty seconds, you'd have found that the commits tagged with bash-5.3 and bash-5.2 follow that format.

Here are the headlines for a couple of fix commits:

  Bash-5.2 patch 12: fixes for compat mode leaving extglob enabled after command substitution
  Bash-5.2 patch 1: fix crash with unset arrays in arithmetic contexts

It looks like discussion of the patches happens on the mailing list, which is easy to access from the page that brought you to the repo browser.

ottokeAuthor

3 months ago

1 reply

This is the upstream source control. The article talks about the Debian packaging source not being in git (on e.g. salsa.debian.org).

simoncion

3 months ago

Eh, I didn't bother to read TFA. So, it was ambiguous as to whether OP was talking about the projects or Debian's packages of the same. I figured it was more likely that OP was talking about the projects and proceeded accordingly.

If that quote's about keeping Debian packaging in source control, I don't really see much benefit for packages like coreutils and bash that generally Just Work(TM) because they're high-quality and well-tested. Sign what you package up so you can detect tampering, but I don't see you really needing anything else.

goodpoint

3 months ago

1 reply

The author is incorrect. Keeping the packaging files under git is done out of convenience but it does not help for security and reproducibility.

The packages uploaded in Debian are what matters and they are versioned.

crote

3 months ago

2 replies

And how are you supposed to verify that the right packages have been uploaded?

The easiest way to verify that is by using a reproducible automated pipeline, as that moves the problem to "were the packaging files tampered with".

How do you verify the packaging files? By making them auditable by putting them in a git repository, and for example having the packager sign each commit. If a suspicious commit slips in, it'll be immediately obvious to anyone looking at the logs.

imoverclocked

3 months ago

> The easiest way to verify that is by using a reproducible automated pipeline

Conversely, this is also an attack surface. It can be easy to just hit "accept" on automated pipeline updates.

New source for bash? Seems legit ... and the source built ... "yeah, ok."

goodpoint

3 months ago

Actually the uploads in Debian are signed and the build process is reproducible and audited.

Distros do not need to update packages on each and every upstream commit.

NewJazz

3 months ago

Specifically the packaging is not in version control. The actual software is, but the Debian maintainer for whatever reason doesn't use source control for their packaging.

typpilol

3 months ago

Also why couldn't they start using it now?

charcircuit

3 months ago

4 replies

It shouldn't have happened in the first place. OpenSSH should control their exact dependencies and Debian shouldn't be meddling with them and swapping them out, loading random code into OpenSSH's process.

>we can only trust open source software. There is no way to audit closed source software

The ability to audit software is not sufficient, nor neccessary for it to be trustworthy.

>systems of a closed source vendor was compromised, like Crowdstrike some weeks ago, we can’t audit anything

You can't audit open source vendors either.

1718627440

3 months ago

1 reply

> Debian shouldn't be meddling with them

Debian is the OS, and the OS vendor should decide and modify the components it uses as a foundation to create the OS as he desires. That's what I am choosing Debian for and not some other OS.

> You can't audit open source vendors either.

What defines open source, is that you can request the sources for audit and modification, so I think this statement is just untrue.

charcircuit

3 months ago

1 reply

If Debian wants to improve or modify OpenSSH and put their own code is, they should rename it and stop using the name of the project. Debian's actions created reputational damage by introducing a backdoor into someone else's product without clearly informing the consumer that they did so.

>you can request the sources

Organizarions that open source software can have closed source infrastructure that you can't request.

1718627440

3 months ago

Debian is famous for modifying all programs it ships, it is more the rule than the exception. That's the deal I get when choosing Debian. SSH is more of a protocol, than a trademarked program.

> Organizarions that open source software can have closed source infrastructure that you can't request.

Which can't be a source for the program binaries, so you can still audit them, you just can't rely on e.g. their proprietary test suite.

lmm

3 months ago

> You can't audit open source vendors either.

You can audit a lot of Debian's infrastructure - their build systems are a lot more transparent than the overwhelming majority of software vendors (which is not to say there isn't still room for improvement). You can also skip their prebuilt packages and build everything on your own systems, which of course you then have the ability to audit.

toast0

3 months ago

> It shouldn't have happened in the first place. OpenSSH should control their exact dependencies and Debian shouldn't be meddling with them and swapping them out, loading random code into OpenSSH's process.

IIRC, this dependency isn't in upstream OpenSSH.

However, OpenSSH is open source with a non-restrictive license and as such, distributors (including Linux distributions) can modify it and distribute modified copies. Additionally, OpenSSH has a project goal that "Since telnet and rlogin are insecure, all operating systems should ship with support for the SSH protocol included." which encourages OS projects to include their software, with whatever modifications are (or are deemed) necessary.

Debian frequently modifies software it packages, often for better overall integration; ocassionally with negative security consequences. Adding something to OpenSSH to work better with systemd is in both categories, I guess.

IshKebab

3 months ago

That's really incidental. There are a gazillion vectors for exploitation once you control a package like xz. You can't fix this issue by plugging them one by one.

ape4

3 months ago

3 replies

Wouldn't the next malware use a different way to embed itself

citbl

3 months ago

1 reply

The next one probably won't be caught by running noticeably slower than usual.

It was a pure fluke that it got discovered _this early_.

ApolloFortyNine

3 months ago

1 reply

If I remember correctly it's days were numbered as soon as that redhat bug report on the valgrind errors piling up was made.

They weren't the ones to find the cause first (that's the person who took a deeper look due to the slowness), but the red flags had been raised.

paulf38

2 months ago

Yes indeed. The backdoor author did try to claim that it was a false positive (and I’m sure that a very depressingly large number of people would happily go along with such a claim even without a scrap of evidence).

The error was related to the use of the frame pointer. Optimised code does not use RBP as the frame pointer, only using RSP for stack addresses. The XZ backdoor code assumed that the stack used this layout. The RedHat regression tests use debug builds that do use the frame pointer. The result was the backdoor code writing below the bottom of the stack.

I suspect also that Valgrind is unique in finding issues like this. Other tools do not check all memory accesses before main. Valgrind loads and runs the test binary from the very beginning and thus it detected errors in the ifunc code used by XZ that executed very early on during ld.so loading and symbol resolution.

xmodem

3 months ago

Why would they bother if we don't act on any of the learnings from this one?

bluGill

3 months ago

Maybe - but original ideas are hard, ideas without flaws are rare: there are reasonable odds someone will try this again.

jart

3 months ago

2 replies

Folks have been ringing the alarm bell for a decade. https://www.nongnu.org/lzip/xz_inadequate.html xz is insane because it appears to be one of the most legitimately dangerous compression formats with the potential to gigafry your data but is exclusively used by literal turbonormies who unironically want to like "shave off a few kilobytes" and basically get oneshotted by it.

Delk

3 months ago

1 reply

The question of whether the xz format is a good choice for long-term archival is entirely unrelated to backdoors or open source supply chain security.

jart

3 months ago

1 reply

No they're the same. Why do you think xz was targeted? It's a giant slippery hairball.

Delk

2 months ago

> Why do you think xz was targeted?

Possibly for any number of reasons. A sole maintainer with a bit too little capacity to keep up the development. A central role as a dependency for crucial packages in a couple of key distros.

What would be the connection between the backdoor (or indeed any supply chain security) and any design details of the xz file format? How would the backdoor have been avoided if the archive format were different?

tredre3

3 months ago

Turbonormies, as you say, tend to use gzip not xz. Which is sad because gzip is just as bad for archiving. A few bytes changed and your entire file is lost (in a .tar.gz it means everything is lost).

Frankly, tarballs are an embarrassing relic, and it's not the turbonormies that insist they're still fit for purpose. They don't know any better, they'll do what people like you tell them to do.

1970-01-01

3 months ago

1 reply

>Can we trust open source software? Yes — and I would argue that we can only trust open source software.

But should we trust it? No!! That's why we're here!

I'm not satisfied with the author's double-standard-conclusion. Trust, but verify does not have some kind of hall pass for OSS "because open-source is clearly better."

Trust, but verify is independent of the license the coders choose.

rcxdude

3 months ago

3 replies

Yes, I would say that being able to view the source code and build it yourself is a necessary but not sufficient condition of properly trusting the software. (which is not quite the same thing as it being open source, but it's relatively rare outside of being a very big customer that you can do this for non-open-source code).

1718627440

3 months ago

1 reply

When you get the source code as a big costumer, that is open source. It might even be free software.

johnny22

3 months ago

1 reply

many folks make a distinction between source available and open source.

1718627440

3 months ago

3 replies

The latter meaning: accepts patches, or what?

lmm

3 months ago

The latter meaning the four freedoms or something equivalent (e.g. complying with the OSD and/or the the DFSG). They don't have to accept patches upstream but they do have to permit sharing your patches with other users one way or another.

NekkoDroid

3 months ago

"Open Source" meaning the license is OSI approved (or at least meets the definition for "Open Source" by the OSI[1]) and source available is anything to which you can get the source to, but the license doesn't meet the above criteria.

[1]: https://opensource.org/osd

rcxdude

3 months ago

Accepting patches isn't a requirement, but it roughly means that you can make your own changes, publish those changes, and use the software for whatever you want. These don't automatically come with being allowed to view the source code.

1970-01-01

3 months ago

What does it matter if you are able to build it all by yourself if you still don't catch the compromised code? That's what is happening here in reality. OSS is now a layer of safety that is being leveraged into a layer of compromise. Caveat emptor!

normie3000

3 months ago

> I would say that being able to view the source code and build it yourself is a necessary but not sufficient condition of properly trusting the software.

And certainly a condition of the "verify" step?

With closed-source software, you can (almost) _only_ trust.

acka

3 months ago

3 replies

I believe the XZ compromise partly stemmed from including binary files in what should have remained a source-only project. From what I remember, well-run projects such as those of the GNU project have always required that all binaries—whether executables or embedded data such as test files—be built directly from source, compiling a purpose-built DSL if necessary. This ensures transparency and reproducibility, both of which might have helped catch the issue earlier.

dijit

3 months ago

3 replies

thats not the issue, there will always be prebuilt binaries (hell, deb/rpm are prebuilt binaries).

The issue for xz was that the build system was not hermetic (and sufficiently audited).

Hermitic build environments that can’t fetch random assets are a pain to maintain in this era, but are pretty crucial in stopping an attack of this kind. The other way is reproducible binaries, which is also very difficult.

EDIT: Well either I responded to the wrong comment or this comment was entirely changed. I was replying to a comment that said. “The issue was that people used pre-built binaries” which is materially different to what the parent now says, though they rhyme.

mananaysiempre

3 months ago

1 reply

The XZ project’s build system is and was hermetic. The exploit was right there in the source tarball. It was just hidden away inside a checked-in binary file that masqueraded as a test for handling of invalid compressed files.

(The ostensibly autotools-built files in the tarball did not correspond to the source repository, admittedly, but that’s another question, and I’m of two minds about that one. I know that’s not a popular take, but I believe Autotools has a point with its approach to source distributions.)

dijit

3 months ago

1 reply

I thought that the exploit was not injected into the Git repository on GitHub at all, but only in the release tarballs. And that due to how Autoconf & co. work, it is common for tarballs of Autoconf projects to include extra files not in the Git repository (like the configure script). I thought the attacker exploited the fact that differences between the release tarball and the repository were not considered particularly suspicious by downstream redistributors in order to make the attack less discoverable.

mananaysiempre

3 months ago

2 replies

First of all, even if that were true, that wouldn’t have much to do with hermetic builds as I understand the term. You could take the release tarball and build it on an air-gapped machine, and (assuming the backdoor liked the build environment on the machine) you would get a backdoored artifact. Fetching assets from the Internet (as is fashionable in the JavaScript, Go, Rust, and to some extent Python ecosystems) does not enter the equation, you just need the legitimate build dependencies.

Furthermore, that’s not quite true[1]. The differences only concerned the exploit’s (very small) bootstrapper and were isolated to the generated configure script and one of the (non-XZ-specific) M4 scripts that participated in its generation, none of which are in the XZ Git repo to begin with—both are put there, and are supposed to be put there, by (one of the tools invoked by) autoreconf when building the release tarball. By contrast, the actual exploit binary that bootstrapper injected was inside the Git repo all along, disguised as a binary test input (as I’ve said above) and identical to the one in the tarball.

To detect the difference, the distro maintainers would have needed to detect the difference between the M4 file in the XZ release tarball and its supposed originals in one of the Autotools repos. Even then, the attacker could instead have shipped an unmodified M4 script but a configure script built with the malicious one. Then the maintainers would have needed to run autoreconf and note that the resulting configure script differed from the one shipped in the tarball. Which would have caused a ton of false positives, because that means using the exact versions of Autotools parts as the upstream maintainer. Unconditionally autoreconfing things would be better, but risk breakage because the backwards compatibility story in Autotools has historically not been good, because they’re not supposed to be used that way.

(Couldn’t you just check in the generated files and run autoreconf in a commit hook? You could. Glibc does that. I once tried to backport some patches—that included changes to configure.ac—to an old version of it. It sucked, because the actual generated configure file was the result of several merges and such and thus didn’t correspond to the output of autoreconf from any Autotools install in existence.)

It’s easy to dismiss this as autotools being horrible. I don’t believe it is; I believe Autotools have a point. By putting things in the release tarball that aren’t in the maintainer’s source code (meaning, nowadays, the project’s repo, but that wasn’t necessarily the case for a lot of their existence), they ensure that the source tarball can be built with the absolute bare minimum of tools: a POSIX shell with a minimal complement of utilities, the C compiler, and a POSIX make. The maintainer can introduce further dependencies, but that’s on them.

Compare this with for example CMake, which technically will generate a Makefile for you, but you can’t ship it to anybody unless they have the exact same CMake version as you, because that Makefile will turn around and invoke CMake some more. Similarly, you can’t build a Meson project without having the correct Python environment to run Meson and the build system’s Python code, just having make or ninja is not enough. And so on.

This is why I’m saying I’m of two minds about this (bootstrapper) part of the backdoor. We see the downsides of the Autotools approach in the XZ backdoor, but in the normal case I would much rather build a release of an Autotools-based project than a CMake- or Meson-based one. I can’t even say that the problem is the generated configure script being essentially an uninspectable binary, because the M4 file that generated it in XZ wasn’t, and the change was very subtle. The best I can imagine here is maintaining two branches of the source tree, a clean one and a release one, where each release commit is notionally a merge of the previous release commit and the current clean commit, and the release tarball is identical to the release commit’s tree (I think the uacme project does something like that?); but that still feels insufficient.

[1] https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78b...

jacquesm

3 months ago

1 reply

Focusing on the technical angle is imo already a step too far. This was first and foremost a social engineering exercise, only secondary a technical one.

dijit

3 months ago

1 reply

this is very true, and honestly troubles me that it’s been flagged.

Even I’m guilty of focusing on the technical aspects, but the truth is that the social campaign was significantly more difficult to understand, unpick and is so much more problematic.

We can have all the defences we want in the world, but all it takes is to oust a handful of individuals or in this case, just one: or bribe them or blackmail them- then nobody is going to be reviewing because everybody believes that it has been reviewed.

I mean, we all just accept whatever the project believes is normal right?

It’s not like we’re pushing our ideas of transparency on the projects… and even if we were, it’s not like we are reviewing them either they will have their own reviewers and the only people left are package maintainers who are arguably more dangerous.

There is an existential nihilism that I’ve just been faced with when it comes to security.

unless projects become easier to reproduce and we have multiple parties involved in auditing then I’m a bit concerned.

mananaysiempre

2 months ago

> I mean, we all just accept whatever the project believes is normal right?

Not in this thread we don’t? The whole thing has been about the fact that it wasn’t easy for a distro maintainer to detect the suspicious code even if they looked. Whether anyone actually does look is a worthy question, but it’s not orthogonal to making the process of looking not suck.

Of course, if we trust the developer to put software on our machine with no intermediaries, the whole thing goes out the window. Don’t do that[1]. (Oh hi Flatpak, Snap. Please go away. Also hi NPM, Go, Cargo, PyPI; no, being a “modern programming language” is not an excuse.)

[1] https://drewdevault.com/2021/09/27/Let-distros-do-their-job....

lmm

3 months ago

> Unconditionally autoreconfing things would be better, but risk breakage because the backwards compatibility story in Autotools has historically not been good, because they’re not supposed to be used that way.

Yes and no. "make -f Makefile.cvs" has been a supported workflow for decades. It's not what the "build from source" instructions will tell you to do, but those instructions are aimed primarily at end users building from source who may not have M4 etc. installed; developers are expected to use the Makefile.cvs workflow and I don't think it would be too unreasonable to expect dedicated distro packagers/build systems (as distinct from individual end users building for their own systems) to do the same.

jacquesm

3 months ago

4 replies

This is not going to be popular: I think the whole idea that a build system just fetches resources from outside of the build environment is fundamentally broken. It invites all kinds of trouble and makes it next to impossible to really achieve stability and to ensure that all code that is in the build has been verified. Because after you've done it four times the fifth time you won't be looking closely. But if you don't do it automatically but only when you actually need it you will be looking a lot more sharpish at what has changed since you last pulled in the code. Especially for older and stable libraries the consumers should dictate when they upgrade, not some automatic build process. But because we're all conditioned to download stuff because it may have solved some security issue we stopped to think about the security issues associated with just downloading stuff and dumping it into the build process.

Sophira

3 months ago

4 replies

I completely agree with you - I think that automatic downloading of dependencies when building is a bad idea.

However, for the sake of devil's advocacy, I do also want to point out that the first thing a lot of people used to do after downloading and extracting a source tarball was to run "./configure" without even looking at what it is they were executing - even people who (rightly) hate the "curl | bash" combo. You could be running anything.

Being able to verify what it is you're running is vitally important, but in the end it only makes a difference if people take the time to do so. (And running "./configure --help" doesn't count.)

bmandale

3 months ago

1 reply

> even people who (rightly) hate the "curl | bash" combo. You could be running anything.

That's true unless I audit every single line, out of potentially millions, in the source of a program I intend to run. If I'm going to do that, then I could audit the ./configure script as well.

uecker

3 months ago

2 replies

This is missing the point. The issue with "configure" is that it is easy to hide malicious code in it because it is so arcane. The issue with "curl | bash" is that - on top - there is not even a proper trust chain and independent verification. "curl | bash" needs to die. Any project who promotes this does not care or does not understand security. "configure" was a necessary evil in the past with all commercial UNIX working differently. Nowadays I think it should go away.

dns_snek

3 months ago

1 reply

> "curl | bash" needs to die. Any project who promotes this does not care or does not understand security.

Do you "understand security"? There's a grain of truth to what you're saying, but not more than that. The crux of this problem is with running untrusted binaries (or unreviewed source code) vs. installing something from a trusted repository.

The majority of people either don't know or don't care to review the source code. They simply run the commands displayed on the website, and whether you ask them to "curl | bash" or "wget && apt install ./some.deb" won't make any difference to their security.

Even if you do a "proper trust chain" and digitally sign your packages, that key is served through the same channel as the installation instructions and thus requires trust on first use, just like "curl | bash".

Unfortunately publishing every piece of software through every operating system's default repository isn't very realistic. Someone, somewhere is going to have to install the binary manually and "curl | bash" is as good of a method for doing that as any.

uecker

3 months ago

1 reply

If people install random stuff from the internet, there is no security. That sometimes this is done is no reason to give up and teach people that "curl | bash" is even remotely ok. "curl | bash" is much worse than every other way to install things from the internet, because there is no guarantee that what one persons gets the same what anybody else gets, so any kind of chance to even discover a compromise is lost.

dns_snek

3 months ago

1 reply

> because there is no guarantee that what one persons gets the same what anybody else gets, so any kind of chance to even discover a compromise is lost.

This applies to "curl | bash", "download an exe and run it", and everything in between equally. If a malicious binary wants to cover up its tracks it can just delete itself and disappear just like "curl | bash" would.

Feel free to educate users about the importance of installing software from trusted repositories whenever possible but demonizing "curl | bash" like it's somehow uniquely terrible is just silly and misses the point completely.

uecker

2 months ago

With a binary, one can compare a hash or store a copy on the binary on another computer. And one person doing this might be enough to figure out something is wrong. But even if people don't, it needs additional effort by the attacker to search for the binary and clean up their tracks, which also creates more opportunities for detection. It is really not at all comparable to "curl | bash". You sound like the people who told me two decades ago that reproducible builds are a waste of time.

jacquesm

3 months ago

Autotools is a hot mess. Anything complex is going to be a rich environment for exploits of all kinds. The more silent the exploit the bigger the chance that it will spread widely.

nmz

3 months ago

I would like to add that sudo make install is a bigger security risk and there is absolutely no need to run make install as root when you could target a directory that mimics / and tar it with the appropriate root permissions leaving only the extraction as root, you could even take a snapshot of the system and undo on error. All done via coreutils.

imoverclocked

3 months ago

Adding sudo in front makes it secure though, right? /sarcasm

Automatic downloading of dependencies can be done in a sane way but not sane without significant effort. eg: building Debian packages can install other pre-packaged dependencies. In theory other packages are built the same way.

Where this becomes an issue specifically is where language-specific mechanisms reach-out and just install dependencies. To be fair, this has happened for a long time (I'm looking at you, CPAN) and does provide a lot of potential utility to any given developer.

What might be better than advocating for "not doing this at all" is "fixing the pattern." There are probably better ideas than this but I'll start here:

1) make repositories include different levels of software by default. core vs community at an absolute minimum. Maybe add some levels like, "vetted versions" or "gist" to convey information about quality or security.

2) make it easy to have locally vetted pools to choose from. eg: Artifactory makes it easy to locally cache upstream software repos which is a good start. Making a vetting process out of that would be ... useful but cumbersome.

At the end of the day, we are always running someone else's code.

frizlab

2 months ago

> I think that automatic downloading of dependencies when building is a bad idea.

Unless the dependencies are properly pinned and hashed.

dataflow

3 months ago

1 reply

> I think the whole idea that a build system just fetches resources from outside of the build environment is fundamentally broken

I think your phrasing is a bit overbroad. There's nothing fundamentally broken with the build system fetching resources; what's broken is not verifying what it's fetching. Audit the package beforehand and have your build system verify its integrity after downloading, and you're fine.

jacquesm

3 months ago

xz.

nobody verifies all packages that are automatically downloaded all the time, unless there is a problem. We got lucky, that time.

kragen

3 months ago

I am pretty sure Debian Policy agrees with you, although I can't cite chapter and verse. Certainly Nix and Guix agree with you. But that evidently wasn't the problem here.

gizmo686

3 months ago

The solution I've seen employed is to prevent the build environment from reaching outside.

Setup a mirror of all the repositories you care about; then configure the network so your build system can reach the mirrors; but not the general Internet.

Of course, once you do this, you eventually create a cron job on mirrors to blindly update themselves...

This setup does at least prevent an old version of a dependency from silently changing, so projects that pin their dependencies can be confident in that. But even in those cases, you end up with a periodic "update all dependencies" ticket, that just blindly takes the new version.

acka

3 months ago

1 reply

My apologies: yes, I edited my comment to try and clarify that I did not mean executable binaries, but rather binary data, such as the test files in the case of XZ.

dijit

3 months ago

All good mate, your comment makes a better argument than the weaker one I interpreted it as prior to the edit.

1oooqooq

3 months ago

how do you test your software can decompress files created with old/different implementations?

the exploit used the only solution for this problem: binary test payload. there's no other way to do it.

maybe including the source to those versions and all the build stuff to then create them programmatically... or maybe even a second repo that generates signed payloads etc... but its all overkill and would have failed human attention as the attack proved to begin with.

huflungdung

3 months ago

This was a devops exploit because they used the same env for building the app as they did for the test code. Many miss this entirely and think it is because a binary was shipped.

Ideally a test env and a build env should be entirely isolated should the test code some how modify the source. Which in this case it did.

jiggawatts

3 months ago

4 replies

Something that the XZ back door made me realise is that the fundamental difference between proprietary and open source software is not the price or source availability for most of its users — no not developers! — it is the reputation and protected brand of the former and the anonymity of the latter.

We have no clue who “Jia Tan” is, a name certain to be a pseudonym. Nobody has seen his face. He never provided ID to a HR department. He pays no taxes to a government that links these transactions to him. There is no way to hold his feet to the fire for misdeeds.

The open source ecosystem of tools and libraries is built by hundreds of thousands of contributors, most of whom are identified by nothing more than an email. Just a string of characters. For all we know, they’re hyper-intelligent aliens subtly corrupting our computer systems, preparing the planet for invasion! I mean… that’s facetious, but seriously… how would we know if it was or wasn’t the case!? We can’t!

We have a scenario where the only protection is peer review: but we’ve seen that fail over and over systematically. Glaring errors get published in science journals all of the time. Not just the XZ attack but also Heartbleed - an innocent error - occurred because of a lack of adequate peer review.

I could waffle on about the psychology of “ownership” and how it mixes badly with anonymity and outside input, but I don’t want this to turn into war and peace.

The point is that the fundamental issue from the “outside” looking in as a potential user is that things go wrong and then the perpetrators can’t be punished so there is virtually no disincentive to try again and again.

Jia Tan is almost certainly a state-sponsored attacker. A paid professional, whose job appears to be to infect open source with back doors. The XZ attack was very much a slow burn, a part time effort. If he’s a full time employee, how may more irons did he have on the fire! Dozens? Hundreds!?

What about his colleagues? Certainly he’s not the one and only such hacker! What about other countries doing the same with their own staff of hackers?

The popular thinking has been that “Microsoft bad, open source good”, but imagine Jia Tan trying to pull something like this off with the source of Windows Server! He’d have to get employed, work in a cubicle farm, and then if caught in the act, evade arrest!

That’s a scary difference.

TheDong

3 months ago

1 reply

> Something that the XZ back door made me realise is that the fundamental difference between proprietary and open source software is not the price or source availability for most of its users — no not developers! - it is the reputation and protected brand of the former and the anonymity of the latter.

You're making a distinction not between open source and proprietary software but rather between hobbyist and corporate software.

There are open source projects made by companies with no external contributions allowed (sqlite sorta, most of google and amazon's oss projects in practice etc)

There are proprietary software downloads with no name attached, like practically every keygen, game crack, many indie games posted for free download on forums or 4chan, etc etc.

jiggawatts

3 months ago

1 reply

Some fair points, but:

> hobbyist and corporate software.

OpenSSL was maintained by like two guys in their spare time, and underpinned trillions of dollars worth of systems and secure transfers.

Would you categorise that as “hobbyist”?

The semantics matter, so I’m going to agree with you and clarify that my concern is with the risks associated with “effectively anonymous contributors allowed” software, where personal consequences for bad actors are near zero.

On the Venn diagram of software licenses and source accessibility, this “especially risky” category significantly overlaps FLOSS and has little overlap with most proprietary software products.

I personally had no bias or aversion to FLOSS software for either personal or professional use, but in all seriousness the XZ attack after the Heartbleed vulnerability made me reconsider my priors.

TheDong

3 months ago

Okay, so you won't use OpenSSL because it's not proprietary enough. What do you use instead?

You pay for nginx plus? Oops, that uses openssl. F5 load balancers since you want to get even more proprietary and expensive? Some of those used OpenSSL too.

Microsoft IIS? Lemme tell you about the history of absolutely bafflingly bad vulnerabilities in that software, far worse than open source nginx ever had.

Effectively anonymous contributions are not what caused heartbleed, they're not what caused the vast majority of breaches and hacks into proprietary software companies nor the vast majority of vulnerabilities.

Bad code is what causes these bugs, and as far as I can tell, the easiest recipe to bad vulnerable code is to have a manager repeatedly tell an engineer "deliver this by friday or you're fired", which happens much less in free software projects.

I'm just trying to get a coherent idea of what you think the right thing to do here is.

How do I stay secure? What OS do I use that doesn't include a ton of open source components and reviews every line of code that goes into it? As far as I can tell, this has already excluded ChromeOS (based on open source packages, many imported without reading all the LoC), macOS (even worse, and an even greater history of vulnerabilities)... I guess windows is the best by this standard? But statistically it's also the most vulnerable, so it doesn't seem like this standard has gotten us to a logical conclusion, does it?

Tuhbrook

3 months ago

1 reply

Dude what did you smoke? Or are you a state actor yourself? 3 sentences in and there is zero logic, 100 fear mongering in your text.

jiggawatts

3 months ago

Says the anonymous green name, missing the point entirely.

kragen

3 months ago

2 replies

I think the difference is that the undoubtedly numerous times that this has happened with Microsoft and other proprietary-software vendors, the users weren't in a position to find out.

jiggawatts

3 months ago

1 reply

Okay, how, could someone like Jia Tan sneak code into a codebase where commits can only be made by authenticated users with staff accounts on a private network?

Versus… a random email offers to help, someone says “sure!”, and… that’s it. That’s the entire hurdle.

Google did discover a Chinese hacker working for them on the payroll. That kind of thing does occur, but it’s rare.

It’s massively harder and more risky.

schuyler2d

3 months ago

1 reply

Well, xz is a rare event too.

There's no knowing how many backdoors were added by small network companies or contractors. But there's rarely accountability when it happens because the company would rather cover it up, or just not ask too many questions about that weird bug

jiggawatts

3 months ago

1 reply

> xz is a rare event too.

The discovery of the hack is rare, sure. Once a decade kind of thing.

The implication is that Jia Tan is a professional, and XZ was one of many irons on the fire.

Don’t be like Trump!

Don’t confuse positive tests with cases!

Jia Tan surely had many other attacks going.

Surely he’s not the only one.

Famously, there are two kinds of large organisations: those that have been hacked, and those that don’t yet know they’ve been hacked.

The open source community was the latter.

Now they’re the former.

Some of you all are still playing catch up.

array_key_first

3 months ago

The main difference is that closed source software is not auditable, so when it is compromised you don't know.

It's safe to assume pretty much all the firmware you're running is vulnerable. It doesn't matter though, because you cannot find out.

The attackers can. You can't. And that's why we still have botnets.

tedunangst

3 months ago

2 replies

Why not? This wasn't found by source review. The computer was slow, somebody looked into why. The bug was discovered via analysis of binary artifacts, and only then traced back to the source. Bruce Dawson does this all the time on Windows.

https://randomascii.wordpress.com/category/uiforetw-2/

array_key_first

3 months ago

Proprietary software typically does everything within its power to stop you introspecting it.

Also, Windows is just suspicious in general. It's slow, everything makes network requests. Finding malware in Windows is a needle in a haystack. For some perspectives, Its all malware.

uecker

3 months ago

It is difficult to find out why Windows is slow again. My colleagues using Windows complain about it regularly, but not not even one ever started an investigation whether there might be backdoor or not, because this would be hopeless. With open-source it is feasible.

dessimus

3 months ago

Something like this has happened in the proprietary world: the SolarWinds supply chain attack. IIRC, they were releasing breached versions for about a year, and I think it became known only when the US Government came knocking on SolarWinds door. SolarWinds potentially vetting every employee through HR had zero effect on preventing a supply chain attack.

aborsy

2 months ago

1 reply

Couldn’t the submission to the Debian be possible only under real identities so that people take responsibility for what they submit?

A random person or group nobody has ever seen or knows submitted a backdoor.

0rdinal

2 months ago

1 reply

1. How could Debian effectively verify an identity?

2. Some people may want to remain pseudonymous for legitimate reasons.

aborsy

2 months ago

It’s not straightforward.

The developers (at least important ones) could register with Debian project, just like they would with a company: submit identity and government documents, proof of physical address, bank account, credit card information, IdP account, .. It would operate like an organization.

The lead developers could meet and know each other through regular meetings. Kind of web of trust with in person verification. There are already online meetings in some projects.

octoberfranklin

3 months ago

Yes of course, and nixpkgs (nixos) already does, although unfortunately not for this particular package.

The XZ backdoor was possible because people stick generated code (autoconf's output), which is totally impractical to audit, into the source tarballs.

In nixpkgs, all you have to do is add `autoreconfHook` to the `nativeBuildInputs` and all that stuff gets regenerated at build time. Sadly this is not the default behavior yet.

sega_sai

3 months ago

From reading this, it seems that one thing one can do is to be force separation of the build from testing, so the build never has access to binary code that can be injected.

2 more comments available on Hacker News

View full discussion on Hacker News

ID: 45636116Type: storyLast synced: 11/20/2025, 7:45:36 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN