A backdoor in xz [LWN.net]

A backdoor in xz

Posted Mar 29, 2024 17:48 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (128 responses)

Based on this should xz be trusted for distribution of upstream tarballs going forward?

A backdoor in xz

Posted Mar 29, 2024 17:57 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Probably not, because the malicious actor has access to the signing key.

Also, that build script is just horrifying.

A backdoor in xz

Posted Mar 29, 2024 21:57 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (1 responses)

The most horrible parts are introduced by the xz maintainer and are obfuscated (for example they use a variable containing "." to invoke the shell's source builtin). The actual code has some ugly sed substitutions but it's not even comparable to this abomination. You can find it at https://fossies.org/linux/NetworkManager/m4/build-to-host.m4

A backdoor in xz

Posted Mar 29, 2024 22:00 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

I... beg to differ. It's not any less horrifying.

A backdoor in xz

Posted Apr 18, 2024 7:46 UTC (Thu) by sam_c (subscriber, #139836) [Link]

I'm responding to this late, my apologies, but to be clear, there was no single project signing key. Lasse never gave up his own key. There was just an announcement a while ago that future releases would be signed by Jia Tan's key.

A backdoor in xz

Posted Mar 29, 2024 18:02 UTC (Fri) by atnot (subscriber, #124910) [Link] (123 responses)

That feels kind of misplaced. It's not like every other project is somehow immune to trusting a maintainer after 2+ years of steady contributions, or indeed any maintainer immune to some sort of coercion that might make them backdoor tarballs.

I do however think that this should mean an end to the practice of preferring manually built upstream tarballs over pulling in git sources directly that distributions such as debian have espoused. It's the one weak link where few eyes exist in an otherwise pretty reproducible pipeline and it was really only a question of time until someone took advantage of it.

A backdoor in xz

Posted Mar 29, 2024 18:13 UTC (Fri) by bluca (subscriber, #118303) [Link] (122 responses)

Yep, we need to stop using curated tarballs, only auto-generated from tags

A backdoor in xz

Posted Mar 29, 2024 18:51 UTC (Fri) by danobi (subscriber, #102249) [Link] (118 responses)

Wouldn't the autogenerated tarball still contain the malicious checked in test binary?

A backdoor in xz

Posted Mar 29, 2024 19:12 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

It's much easier to analyze individual commits.

A backdoor in xz

Posted Mar 30, 2024 1:10 UTC (Sat) by jengelh (subscriber, #33263) [Link] (7 responses)

When people stop using curated tarballs, developers will just add generated files into the SCM. Some projects have historically done that all along anyway. I happened to find https://git.savannah.gnu.org/git/bash.git does that, for example.

Eschewing curated tarballs in favor of an autogenerated git-archive does not do anything to establish that a particular software is fully benevolent. I bet none of you ever vetted m4/po.m4 of https://github.com/bminor/bash/archive/refs/tags/bash-5.2... .

A backdoor in xz

Posted Mar 30, 2024 1:29 UTC (Sat) by bluca (subscriber, #118303) [Link] (2 responses)

Of course by itself it doesn't prove that the software is not malicious, how could it? That's not the point, the point is increasing auditability. A commit in a repository is eminently auditable, while random stuff getting injected from a developer's machine in a tarball after the fact, before publishing, is not.

A backdoor in xz

Posted Apr 3, 2024 7:21 UTC (Wed) by LtWorf (subscriber, #124958) [Link] (1 responses)

Well a commit that generates a configure script is very unlikely to get seriously reviewed

A backdoor in xz

Posted Apr 5, 2024 13:20 UTC (Fri) by rav (subscriber, #89256) [Link]

My approach to reviewing commits with autogenerated code (in the context of approving a pull request) is to autogenerate the code myself and see if I get the same result. If there are differences between the submitted code and what I could autogenerate myself, then that's probably the interesting stuff to look at. If I don't know how to autogenerate it myself, I ask the author to provide the instructions in the commit message or in a source code comment. Having autogenerated code in a source code repository is not nice, but if it's necessary, then the code review process needs to adapt to it.

A backdoor in xz

Posted Mar 30, 2024 1:52 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> When people stop using curated tarballs, developers will just add generated files into the SCM.

Most of the autogenerated files are autohell-related scripts. Honestly, if you still depend on it, you can install the required dependencies and run autogen.sh yourself on the build host. It's not 1994 anymore.

The SCM management for Bash is atrocious. We just need to switch away from it to something like zsh by default. At this point in time, doing large code drops for something as critical as Bash is just bordering on malpractice.

And relying on autohell for builds _is_ malpractice.

A backdoor in xz

Posted Mar 30, 2024 10:08 UTC (Sat) by nim-nim (subscriber, #34454) [Link] (1 responses)

The build scene is unfortunately ripe for exploits because mainstream tools are old and crufty and the FAANGS, GitHubs and GNOMEs of the world only care about giant monorepos and static builds and vendoring and containers & flatpacks which all basically mean pile up as much code as you can to avoid any dev porting effort and someone else (never defined) will somehow manage to audit the giant pile of stuff and detect malware.

Safe practices are well known that’s small auditable reusable components, that build from signed archives with no third party altered code dropped in, and frugal acyclic dependency graphs but that‘s exactly the reverse of where we’ve been doing those past years. Devs understand code modularity not build modularity.

The pile of junk has been avoiding any catastrophic collapse so far (apart from the log4j episode with every one else pretending it’s java-specific while replicating the very same build workflows) but that’s only a question of time.

A backdoor in xz

Posted Mar 30, 2024 15:09 UTC (Sat) by marcH (subscriber, #57642) [Link]

> The pile of junk has been avoiding any catastrophic collapse so far ...

Has it? With spies the main thing we know is: we know very little.

A backdoor in xz

Posted Mar 30, 2024 15:15 UTC (Sat) by gdamjan (subscriber, #33634) [Link]

> When people stop using curated tarballs, developers will just add generated files into the SCM.

There is a solution to that these days.
A Github Actions can take the tagged source code commit, generate artifacts and sign them so you do have the provenance that the artifact (tarball) was created from the given source code with the given github action.

A backdoor in xz

Posted Mar 29, 2024 19:12 UTC (Fri) by rwmj (subscriber, #5474) [Link] (108 responses)

The exploit is in two parts. Two "test files" which contain the payload; and a modified m4 script (m4/build-to-host.m4) which initiates the process of loading the payload. The test files were added first and are part of git. They don't do anything on their own. The m4 modification was only made in the github tarball, somewhat later, and it's not checked into git. (It was a surprise to me that github release tarballs aren't just tarred up copies of the git checkout.)

Having said all that, I wouldn't rely on any version of xz which "Jia Tan" (a pseudonym, I assume) has touched, and unfortunately he's been contributing to xz and been an upstream committer for 2+ years.

A backdoor in xz

Posted Mar 29, 2024 19:39 UTC (Fri) by ewen (subscriber, #4772) [Link] (11 responses)

GitHub releases are basically tags, that then have some additional metadata and files associated with them. GitHub does create a “tar of git checkout” automatically like you suggest (although people have been asking for a way to turn that off: https://github.com/orgs/community/discussions/6003). But anyone with access can also upload more files, including binaries and their own source archive (with a different name to the automatic ones).

For lots of projects building from the automatic git checkout zip is non trivial (eg missing generated things like configure scripts), so at least historically many projects have suggested people ignore the git checkout archive and use the source archive someone uploaded as a release file, that includes extra generated files.

Possibly it makes sense to switch to always doing the git tag checkout on the build system. But it would definitely complicate things like Debian’s “orig.tar.gz” and patches source archiving process. And probably require some more build dependencies to generate the extra “source” files.

Ewen

A backdoor in xz

Posted Mar 29, 2024 20:17 UTC (Fri) by excors (subscriber, #95769) [Link] (3 responses)

GitHub's auto-generated tarballs also have the issue that they don't promise stability, so old releases might unexpectedly get a different checksum (with the same content but different compression etc), which can break build systems: https://lwn.net/Articles/921787/ . GitHub reverted that change, but they still explicitly don't promise long-term stability of archives, and say you should use commit IDs (no checksums) or externally-uploaded release tarballs: https://github.blog/2023-02-21-update-on-the-future-stabi...

A backdoor in xz

Posted Mar 29, 2024 20:37 UTC (Fri) by randomguy3 (subscriber, #71063) [Link]

commit IDs are their own checksum, of course - providing you use git to grab them

A backdoor in xz

Posted Mar 30, 2024 0:21 UTC (Sat) by jdulaney (subscriber, #83672) [Link] (1 responses)

it almost sounds as if github should not be used as a release mechanism

A backdoor in xz

Posted Mar 30, 2024 13:38 UTC (Sat) by smurf (subscriber, #17840) [Link]

You can use github's release mechanism all you like, just be sane about it.

This means that your tarball gets generated by a verified and pinned-down github action and doesn't access external resources. EVER.

While the fact that widely-used libraries like xz still allow developer-supplied release uploads can plausibly be explained (excused, really) with laziness, the line between that and malpractice is a thin one.

Against stupidity, the Gods themselves …

A backdoor in xz

Posted Mar 30, 2024 2:14 UTC (Sat) by salimma (subscriber, #34460) [Link] (3 responses)

> Possibly it makes sense to switch to always doing the git tag checkout on the build system. But it would definitely complicate things like Debian’s “orig.tar.gz” and patches source archiving process. And probably require some more build dependencies to generate the extra “source” files.

It works just fine on Debian -- you point the watch file to the URL corresponding to the auto-generated tarball, and uscan (whether called directly or via gbp) would generate the orig.tar.gz just fine.

You'd need more build dependencies, sure

A backdoor in xz

Posted Mar 30, 2024 13:45 UTC (Sat) by smurf (subscriber, #17840) [Link] (2 responses)

Debian's "orig.tar.gz" is itself an anachronism that should be tossed into the garbage bin of history sooner rather than later.

They should check out the appropriately-tagged "debian" git branch, build binaries, package them. Done. No more "grab an orig.tar which you didn't build, then apply patches with poissibly-traceable provenance" dances. PLEASE.

A backdoor in xz

Posted Apr 1, 2024 5:19 UTC (Mon) by ras (subscriber, #33059) [Link] (1 responses)

> They should check out the appropriately-tagged "debian" git branch, build binaries, package them. Done.

I get what you saying when talking about patches done to the source. Asking git to apply a series of commits is generally much easier then doing the same thing with quilt or whatever.

However, a git repository and git commits aren't in my opinion source code. Which is to say they aren't a program developers can inspect and modify that reproducibly takes pristine upstream source as input, and produces the source that will be distributed and compiled to produce the Debian binary package as output. That's the sort of thing you need to make the process auditable. It's what uscan and patches produce.

Debian does have tools that take a git repository and spit out the debian source packages in an auditable format - and yes that includes the .orig.tar.gz plus patches. Isn't that enough?

A backdoor in xz

Posted Apr 1, 2024 9:39 UTC (Mon) by smurf (subscriber, #17840) [Link]

> However, a git repository and git commits aren't in my opinion source code.

Source code is defined as the preferred format to use when you want to work on the code in question. I submit that these days the number of upstream authors of nontrivial packages whose workflow consist of "while buggy: edit compile debug", followed by "edit version# && make clean && tar cfz && upload", is essentially zero.

Thus it seems like a good idea to use the code that's actually checked into upstream's version control as a basis for automated building of a distribution's binaries *and* the sources it needs to provide, for copyleft-right-and-center reasons if nothing else. Instead of ignoring upstream's git archive and basing the build on some unsigned tarball somebody created somehow.

Mostly-seamless conversion from upstream-plus-patches to Debian-branch-of-git-archive and back is pretty straightforward these days, thanks to dgit and related tools.

A backdoor in xz

Posted Apr 1, 2024 16:56 UTC (Mon) by sammythesnake (guest, #17693) [Link]

> For lots of projects building from the automatic git checkout zip is non trivial (eg missing generated things like configure scripts)...

Surely the fix for this is for those "generated things" to be generated as part of the build process - if they're auto-generated, then there's no reason for that autogeneration to be out of band. If the repo doesn't include everything that's needed to generate them (modulo configuration for things like build architecture or which features to include or whatever) then it's not "complete corresponding source"

In GPL terms: "all the source code needed to generate, install, and (for an executable work) run the object code") is required for distribution. Even outside of the GPL, it's just bloody obvious to track it - I wouldn't dream of leaving any such out of source control for anything I write. This applies even if I'm not planning to release it, let alone for something others' work relies on - I don't trust even *myself* to get it right reliably without git or whatever automating it!

A backdoor in xz

Posted Apr 7, 2024 13:16 UTC (Sun) by chestnut (guest, #170772) [Link] (1 responses)

> For lots of projects building from the automatic git checkout zip is non trivial (eg missing generated things like configure scripts), so at least historically many projects have suggested people ignore the git checkout archive and use the source archive someone uploaded as a release file, that includes extra generated files.

sorry, I'm a beginner and I didn't find anything on google about "many projects have suggested people ignore the git checkout archive and use the source archive someone uploaded as a release file", can you give me some tips? Maybe a source link, and why not upload these scripts to GitHub

A backdoor in xz

Posted Apr 7, 2024 14:54 UTC (Sun) by pizza (subscriber, #46) [Link]

> can you give me some tips?

An example of this is darktable: "As always, please don't use the autogenerated tarball provided by github, but only our tar.xz file."

> and why not upload these scripts to GitHub

They pretty much always are, typically called something like "./bootstrap.sh" . But to generate the distribution tarball, you usually need additional dependencies or tools.

Another example of this is gutenprint; As well as the autotools stuff, the distribution tarballs have a lot of other auto-generated stuff (eg supported printer lists) that would otherwise cause major issues if you are trying to cross-compile things.

In both cases the CI systems auto-generates a release tarball after every commit.

A backdoor in xz

Posted Mar 29, 2024 19:41 UTC (Fri) by atai (subscriber, #10977) [Link] (7 responses)

>Having said all that, I wouldn't rely on any version of xz which "Jia Tan" (a pseudonym, I assume) has touched, and unfortunately he's been contributing to xz and been an upstream committer for 2+ years.

shall all his past commits be analyzed?

A backdoor in xz

Posted Mar 29, 2024 19:49 UTC (Fri) by rwmj (subscriber, #5474) [Link]

Yes, and analyzing past github hosted tarballs may be even more important.

A backdoor in xz

Posted Mar 29, 2024 21:44 UTC (Fri) by joey (guest, #328) [Link]

All commits made after he gained control of the project should also be checked, since he could push commits with any purported Author. Or just revert to before that point.

A backdoor in xz

Posted Mar 30, 2024 2:44 UTC (Sat) by helsleym (guest, #92730) [Link] (4 responses)

The maintainer mentioned something kind of chilling about the activities of "Jia Tan":

> "He has been helping a lot off-list"

That makes the activities of "Jia Tan" harder to audit and could even have been why they diverted discussion off-list. I'm sure it seemed innocuous to the maintainer but thanks to hindsight it seems like a significant social element of this attack.

(source: https://www.mail-archive.com/xz-devel@tukaani.org/msg0057... )

A backdoor in xz

Posted Mar 30, 2024 10:20 UTC (Sat) by nim-nim (subscriber, #34454) [Link] (3 responses)

It may or may not have been a pseudonym for a person or a group of people.

The pseudonym may or may not point to a specific country.

The pseudonym holder may or may not have his own credentials compromised.

The pseudonym holder may of may not have had a visit from criminal groups or some agency that made him a proposal he could not refuse.

The pseudonym holder may have been working all this time for those groups, that may be as much interested in securing their own systems as in compromising the systems of others. Or he may be a disgrunted (ex-)employee, retaliating for something we do not know about.

Someday soon it may be an emergent malicious AI masquerading as a person (or just some catastrophic mistake in the way someone trained his model).

The possibilities in the brave new world we live in are endless. It is pointless to speculate or try to audit people at this point. Making builds more transparent and easier to check and audit is what is needed.

A backdoor in xz

Posted Mar 30, 2024 16:06 UTC (Sat) by marcH (subscriber, #57642) [Link] (2 responses)

> The possibilities in the brave new world we live in are endless.

It is harder to exploit but inserting some memory corruption in C/C++ is orders of magnitude more discreet than this blunt takeover of project ownership. Even better: you get more than one shot at it because you can plausibly pretend it was a genuine mistake. Cause it so often is.

Any spy agency that hasn't successfully done this yet in a variety of open-source projects is incompetent and its management should be fired.

> Making builds more transparent and easier to check and audit is what is needed.

Amen.

And of course: more code reviews, static analysis, test coverage, valgrind, safer languages, etc.

Basically just slow down; less code with more quality. The opposite of what corporations and people want.

A backdoor in xz

Posted Mar 30, 2024 16:36 UTC (Sat) by rra (subscriber, #99804) [Link] (1 responses)

> Basically just slow down; less code with more quality. The opposite of what corporations and people want.

It's tricky, though, because one of the ways to get more quality is to realize that some of the foundations are shaky and built on a bunch of poorly-thought-out principles. But if you rebuild them, that is a form of speeding up, since it requires changes by everyone else to move to the new thing.

I use a variety of software on a daily basis that, so far as I can tell, has worked correctly for years. In some cases I've been tempted to adopt it and maintain it as part of that effort in slowing down and improving quality. But probably 95% of the time when I look at the source to some old program that I have come to rely on, it's written in janky C with global variables all over the place, static buffers, no comments, and a code flow that I find difficult to follow. I have rescued some of my own software from such a state through the power of sheer embarrassment, but I've yet to have the oomph to rescue someone else's software.

We have an enormous maintenance problem, and I'm not sure what slowing down and writing less code with more quality looks like in the face of that maintenance problem. It's one of the reasons why I'm somewhat sympathetic to the folks who would prefer to rewrite the world. It looks like speeding up and writing more code, the opposite of what you're correctly advocating, but the advantage of a clean slate is that it's a lot easier to add standards for slower work and higher quality when starting from scratch than it is to retrofit them to an existing community, or even an orphaned code base.

(And this is all apart from the fact that slower and higher quality is more expensive, and we lack any effective mechanism to fund resilience, sustainability, and going slower. Not just in software, but in most human endeavor. Most social and political forces are pushing in exactly the opposite direction, as you correctly note.)

A backdoor in xz

Posted Mar 30, 2024 17:11 UTC (Sat) by marcH (subscriber, #57642) [Link]

> It's one of the reasons why I'm somewhat sympathetic to the folks who would prefer to rewrite the world. It looks like speeding up and writing more code, the opposite of what you're correctly advocating, but the advantage of a clean slate is that it's a lot easier to add standards for slower work and higher quality when starting from scratch than it is to retrofit them to an existing community, or even an orphaned code base

To be clear: if "rewriting the world" works better and faster than raising the bar on an existing and time consuming code base then I'm all for it.

Either is much slower than merging untested and barely reviewed commits.

A backdoor in xz

Posted Mar 29, 2024 22:20 UTC (Fri) by smcv (subscriber, #53363) [Link] (1 responses)

> (It was a surprise to me that github release tarballs aren't just tarred up copies of the git checkout.)

The tarballs that github itself auto-generates *are* just a `git archive`, but those are unsuitable for:

1. projects with git submodules, which need to "flatten" those into a monolithic tarball for it to be possible to build in an offline way from just a tarball;

2. projects with build systems that rely on (or have traditionally relied on) putting generated cruft in the dist tarball, like Autotools when used in its traditional "you don't need any external tools" mode;

3. projects that include convenience copies of anything else in their source tarball, like prebuilt HTML documentation or prebuilt Windows binaries

IMO the answer to the first is to prefer `git subtree`, the answer to the second is to move away from such build systems towards for example Meson or CMake (or failing that, distribute only the actual source and expect recipients to run ./autogen.sh themselves), and the answer to the third is "don't do that, then"; but a lot of projects have at least one of those factors and little appetite for moving away from it.

A backdoor in xz

Posted Mar 30, 2024 0:53 UTC (Sat) by heftig (subscriber, #73632) [Link]

Git submodules are not that bad as long as the set of submodules doesn't change often and they're limited to a single level. If you have a package build system that can handle Git, you can have it download all the repos, change the URLs in the superrepo to the local clones and git-submodule-update will clone and check out the right commits.

E.g. see https://gitlab.archlinux.org/archlinux/packaging/packages...

Sources for Meson subprojects that aren't in the repo or tarball can also be provided; our Mesa PKGBUILD does this.

But yeah, I would prefer if git-subtree were more popular.

A backdoor in xz

Posted Mar 29, 2024 22:29 UTC (Fri) by mss (subscriber, #138799) [Link] (85 responses)

"Jia Tan" (a pseudonym, I assume)

If that's truly a nation-state attack (as some claim) then I would assume there's a whole team behind this made-up identity.

Verify the identity of developers

Posted Mar 30, 2024 7:39 UTC (Sat) by epa (subscriber, #39769) [Link] (84 responses)

It’s probably a dystopian future none of us want. But what if GitHub required all contributors to use their real name and strongly verified their identity? Then for key roles you could require someone with a clear track record and resident in a Western country where they could be prosecuted for maliciously introducing backdoors.

Of course “moles” could still exist, but they would be much more difficult for hostile nations to create than an entirely fake identity like Jia Tan. To add a backdoor the easiest way would then be to compromise some developer’s workstation or steal their private key. Which is by no means impossible, but raises the difficulty somewhat, and is harder to do undetected.

It wouldn’t even need to be GitHub. If Debian started using git sources for releases (no more downloading of random tarballs), a further step would be to require signing of each individual change in the repository (via signed tags or something, I don’t know the details, but it’s possible somehow) and then not trust changes from a new contributor until that person has visited a Debian keysigning party. Again, hardly watertight, but better than no checks at all.

Most of the discussion here has focused on supply chain measures. And I agree with that approach—not trying to “catch the bad guys”. But once the weakest links are tightened, and accepting the fact that we will never have enough reviewers to check every commit, it could make sense to review the people themselves.

Verify the identity of developers

Posted Mar 30, 2024 8:08 UTC (Sat) by immibis (subscriber, #105511) [Link] (1 responses)

They sort of are, since GitHub introduced mandatory SMS 2FA for anyone it considers important enough.

Verify the identity of developers

Posted Mar 30, 2024 9:28 UTC (Sat) by pabs (subscriber, #43278) [Link]

There are non-SMS forms of 2FA allowed, TOTP for example, which anyone can do without any identity checks using just software.

Verify the identity of developers

Posted Mar 30, 2024 9:48 UTC (Sat) by kazer (subscriber, #134462) [Link]

> But what if GitHub required all contributors to use their real name and strongly verified their identity?

That is meaningless when GitHub is used just as a mirror and real commits happen elsewhere.
You would need to require that in every repository for everyone contributing.

Something having PGP/X.509 signature for every committer and commit would be part of the solution, but in case someone decides to turn to the dark side you still need to review every change.

And key here is that these changes were not sufficiently reviewed since we don't know if the account was compromised or not.

Verify the identity of developers

Posted Mar 30, 2024 10:51 UTC (Sat) by mss (subscriber, #138799) [Link] (10 responses)

But what if GitHub required all contributors to use their real name and strongly verified their identity? Then for key roles you could require someone with a clear track record and resident in a Western country where they could be prosecuted for maliciously introducing backdoors.

This would be blatant discrimination of most of the world population, which isn't lucky enough to live in a Western country.

And if you drop the "Western country" requirement then a rogue nation-state is in the best position to provide its operatives with as many "genuine" identity verification documents as they might need.

Verify the identity of developers

Posted Mar 30, 2024 12:38 UTC (Sat) by epa (subscriber, #39769) [Link] (9 responses)

Yeah… I didn’t say it would be nice. For security sensitive jobs it’s standard practice to do background checks and to require the person be a US citizen, or whatever. And the check that the person exists (and isn’t a made-up identity or pseudonym) is so basic for any job that it’s not even mentioned.

Anonymous or pseudonymous entities (like “Satoshi”) can make valuable contributions but it may be unwise to put them in a trusted position where code they write is directly executed or installed on millions of systems.

Verify the identity of developers

Posted Mar 30, 2024 17:54 UTC (Sat) by sjj (subscriber, #2020) [Link] (8 responses)

Believe it or not, but there are legitimate “security sensitive jobs” that are not in the US, or require American citizens.

Would for example Kenyan devs be “Western” enough for you?

Verify the identity of developers

Posted Mar 30, 2024 19:52 UTC (Sat) by epa (subscriber, #39769) [Link] (7 responses)

Indeed, I was using Western as shorthand. And now we get into politics. There is surely a scale with rogue states like North Korea on one side and … some other countries on the other side. LWN isn’t the place for a discussion of whether a developer in Kenya can be considered a real person (not a fake identity made by security services) and whether some country has enough rule of law that someone doing malicious things would be prosecuted (rather than sheltered by the authorities). We are not discussing a technical proposal. It gets messy and of course there are those who don’t trust the USA or allies either. All I can say is that we may be forced, unhappily, to start thinking about this stuff.

Verify the identity of developers

Posted Mar 30, 2024 21:33 UTC (Sat) by MarcB (subscriber, #101804) [Link] (6 responses)

Politics are unavoidable, if you go that route. For example, the US in particular is a proven bad actor here - recall the Snowden revelations about the supply-chain attacks the NSA pulled off. They literally did what nowadays Chinese suppliers are alleged to do - and they used this against supposed allies as well, so there isn't even a "western world against China/Russia" scenario.

This "scale of rogue states" you envision would be one that goes downhill in every direction, no matter from whose perspective you look at it.

Also, if you assume nation-state attackers - which you should - and assume that developers will be working from their home country, then any meaningful verification is simply impossible. It is even unreliable and expensive if you hire employees and have them move to your country. If they stay abroad, it simply can't be done.

Verify the identity of developers

Posted Mar 31, 2024 8:54 UTC (Sun) by epa (subscriber, #39769) [Link] (5 responses)

Indeed the US has a record here and if I were running a nuclear installation in Iran I would require verified Iranian programmers working within the country. But for the rest of us, there is a difference between western countries and others. Many American businesses and even the government rely on the security of Linux and free software. The black-helicopter guys generally have an interest in helping keep free software secure against attacks from ransomware gangs and hostile nation hacking teams. They may do targeted attacks but it doesn’t help them to poison the well as happened here. So I would still prefer to trust a developer with a known legal identity in a US ally over someone who may not be a real person at all. But again, this is politics, and not something where you’ll ever get everyone in agreement.

Verify the identity of developers

Posted Apr 1, 2024 13:28 UTC (Mon) by farnz (subscriber, #17727) [Link] (4 responses)

The USA has a track record of attacking its allies via secret programs (as does every other Western democracy I've looked into - the USA is not unique here). There's thus a constant tug-of-war going on; do I trust the USA because they'd prefer me to do well rather than their enemies, or do I distrust them because they would prefer to attack me in order to benefit the USA and American companies at my expense?

Purely in terms of "can I trust their home nation", the only safe developers are those with the same national affiliation as you, since whatever leverage the nation-state can apply to them can also be applied to you. And that, fundamentally, comes down to a personal trust matter; can you trust the people you depend upon, or not? Can you "trust, but verify"? Or are you stuck with an untrustworthy partner whose behaviour cannot be verified?

Verify the identity of developers

Posted Apr 2, 2024 15:24 UTC (Tue) by MarcB (subscriber, #101804) [Link] (3 responses)

> Purely in terms of "can I trust their home nation", the only safe developers are those with the same national affiliation as you, ...

And even that is a best-case scenario. Many countries do not have strong civil rights and their secret services target their own citizens.

Verify the identity of developers

Posted Apr 2, 2024 16:08 UTC (Tue) by farnz (subscriber, #17727) [Link]

My expectation is that the secret services etc are as much a risk to you as they are to someone you deem "trusted". Basically, by trusting people in the same immigration situation as you and in the same country as you, you're trusting people inside the same "national security" boundary as you; the moment you go outside that, you're at risk of your colleague being targeted by a national security agency that cannot reach you, even if they are an upstanding person themselves.

So, while Canada is basically a safe country, as someone based in England, I'm in a separate "national security" boundary to a Canadian; it is at least theoretically possible that a Canadian agency can't get to me, but can compromise a Canadian contact, and it's also possible for a UK agency to compromise me, while being unable to compromise my Canadian contact.

This analysis still applies even if the two countries are evildoers; my local set of evil government agencies affect me and anyone else in the same country, and your local set affect you and anyone else in the same country. If we're in different countries, and you need only affect one of us, then the set of agencies to care about is the union of the sets we're affected by.

Verify the identity of developers

Posted Apr 3, 2024 17:39 UTC (Wed) by jafd (subscriber, #129642) [Link] (1 responses)

Also, moles exist.

Verify the identity of developers

Posted Apr 4, 2024 19:01 UTC (Thu) by epa (subscriber, #39769) [Link]

Running a mole is about a million times more expensive than creating a GitHub account under a completely made-up identity. That’s the idea.

Verify the identity of developers

Posted Mar 30, 2024 14:40 UTC (Sat) by dvdeug (subscriber, #10998) [Link] (8 responses)

> resident in a Western country where they could be prosecuted for maliciously introducing backdoors.

Crypto AG was a Swiss encryption company that turned out to be owned by the CIA and West German Federal Intelligence Service (BND) between 1970 and 1993, with the CIA owning it until 2018, that sold backdoored encryption systems to many nations. The NSA produced and promoted the Dual_EC_DRBG CSPRNG that could be used with SSL, with the general consensus being that they had a backdoor, as the possibility of an undetectable backdoor existing was well-known and there was little other reason to use it.

Even if we trust the CIA and NSA (and friends; the Crypto AG story also implicates Germany and Switzerland) blindly, the history of espionage says little for the security of blindly trusting people resident in western nations.

Verify the identity of developers

Posted Mar 30, 2024 16:15 UTC (Sat) by epa (subscriber, #39769) [Link]

Are you saying Crypto AG was publishing all its source code and customers were doing reproducible builds of that?

I certainly do not advocate blindly trusting someone just because they are a known individual from a non-hostile country. We need all the other “many eyeballs”, verified supply chain, and sandboxing too. Once you have all that in place, the next step might be to start strongly identifying developers.

For cryptography the situation is a little different as the cipher is fully described but only experts can analyse it for weaknesses. A backdoor in a library does not require quite the same level of expertise to spot.

Verify the identity of developers

Posted Mar 30, 2024 17:48 UTC (Sat) by marcH (subscriber, #57642) [Link] (6 responses)

> Crypto AG [...] that sold backdoored encryption systems to many nations. The NSA produced and promoted the Dual_EC_DRBG CSPRNG that could be used with SSL, with the general consensus being that they had a backdoor,

These are very real but they were also huge scandals. In dictatorships such stuff is business as usual.

In democracies these questions can be and are debated and there is some oversight on intelligence agencies. A poor level of control but much better than answering to a single person at the top.

Developers do not automatically risk jail or death or worse if they refuse a "proposition" from the intelligence agency of a democracy.

Democracies are fragile and very far from perfect but they're at least trying. So, verifying the identity of developers who live in a democracy would be very far from a silver bullet and I'm not even sure it would be a good idea in the first place but it would be for sure _very_ different from verifying the identify of developers surviving in a dictatorship.

Nothing's black and white here but there can be a huge differences between "light grey" and "dark grey"; let's be careful with "whataboutism".

PS: the main drive (and... sin) of the Western world is unabated greed and the fair dose of corruption that comes with it. Yet Western business is nowadays required to distance itself more and more from powerful dictatorships which is costing a LOT. Guess why.

Verify the identity of developers

Posted Mar 30, 2024 18:15 UTC (Sat) by rra (subscriber, #99804) [Link] (3 responses)

> Developers do not automatically risk jail or death or worse if they refuse a "proposition" from the intelligence agency of a democracy.

I live in a country that is theoretically a democracy but has something called "National Security Letters" that are substantially similar to this description, which makes me dubious the line is quite this clear-cut.

Verify the identity of developers

Posted Mar 30, 2024 19:00 UTC (Sat) by marcH (subscriber, #57642) [Link] (2 responses)

I wrote "not automatically", "shades of grey" and used plenty other adverbs but apparently not enough.

Verify the identity of developers

Posted Mar 30, 2024 19:59 UTC (Sat) by rra (subscriber, #99804) [Link]

> I wrote "not automatically", "shades of grey" and used plenty other adverbs but apparently not enough.

I'm sorry, you're right, you did, and I should have acknowledged that. I still think I disagree with you somewhat about the intensity of the difference, but it's hard to quantify and there's nothing incorrect in what you said.

Verify the identity of developers

Posted Apr 1, 2024 15:31 UTC (Mon) by Lennie (subscriber, #49641) [Link]

For context, I'll post you a link to a talk by a brave man:

https://media.ccc.de/v/27c3-4263-en-resisting_excessive_g...

Resisting the state if they come knocking is euh... hard work to say the least.

Formally good actors can (be made to) turn or get their systems compromised.

Which is why we need to focus on checking code and making code more readable probably less focus less on the actors.

Verify the identity of developers

Posted Mar 30, 2024 19:50 UTC (Sat) by dvdeug (subscriber, #10998) [Link] (1 responses)

> These are very real but they were also huge scandals.

Huge scandals? Crypto AG, by the time it came out, was a minor historical note. I'd bet most Americans would consider it a good move that let us spy on foreign enemies, and many wouldn't bat an eye at the amount of spying done on allied powers. The NSA stunt annoyed some in the field of computer security, but few outside it.

> In dictatorships such stuff is business as usual.

I'm not sure. China isn't going to play around with Dual_EC_DRBG; you use their keys for SSL or else. They openly put taps in, and wouldn't bother with something nobody is going to trust. This skullduggery is part and parcel of Western intelligence agencies.

> Developers do not automatically risk jail or death or worse if they refuse a "proposition" from the intelligence agency of a democracy.

https://arstechnica.com/tech-policy/2013/04/wikipedia-edi... There's a risk of jail time. Nor do I think US intelligence agencies are above blackmail, though the CIA might reserve that for people outside the US.

> Nothing's black and white here but there can be a huge differences between "light grey" and "dark grey"

Yes, but at the same time, intelligence agencies, democracy or otherwise, are very clandestine and have a history of this type of stuff.

As another note, a new developer from the Western world could be a genuine newbie, but could also be someone who found a job posting on the dark web when looking to buy the "Easy Home Kit for Cooking Meth", and now gets paid for his identity, quite possibly with no idea who is paying him.

> Yet Western business is nowadays required to distance itself more and more from powerful dictatorships which is costing a LOT. Guess why.

More and more? If we're talking about China, it's got Cold War parallels, two large powers fighting over control of the world. Russia has even more Cold War parallels, and while pulling out of Russia is bad for business, so is letting Russia take over Eastern Europe. There's moral elements, and fears that Chinese products have backdoors. I'm really not sure what you're getting at here. I certainly don't see Western business having much trouble dealing with dictatorships that haven't pissed off the West.

Verify the identity of developers

Posted Apr 1, 2024 0:24 UTC (Mon) by marcH (subscriber, #57642) [Link]

> Huge scandals? Crypto AG, by the time it came out, was a minor historical note.

It was probably not mentioned in family dinners but it made the mainstream press which is very rare for topics like this. It also taught many "naive" countries to stop blindly trusting their allies. So yes, it was a pretty big deal.

> https://arstechnica.com/tech-policy/2013/04/wikipedia-edi... There's a risk of jail time.

Whether this particular case has merit or not, publishing classified information is of course pretty stupid and totally unrelated to inserting backdoors in open source projects.

> but could also be someone who found a job posting on the dark web

Yes verifying identities will never be a silver bullet. But it would be for sure more useful in a place with a functional legal system where such a person runs the risk of being caught and prosecuted (as opposed to getting a medal from their dictator).

> I'm really not sure what you're getting at here. I certainly don't see Western business having much trouble dealing with dictatorships that haven't pissed off the West.

My (admittedly confusing) tangent/PS was: businesses and monopolies tend to buy congressmen and run the show in democracies, especially in the US since "Citizens United vs FEC". As you noted, businesses generally don't care about dictatorships, only about money. That's why they are called "businesses". But even in these short-sighted, market-based countries, authorities are starting to realize the magnitude of the risks and problems and are (slowly) taking precautions affecting the bottom line of their uber rich and all powerful businesses. The IT naivety is (slowly) regressing and that's a good thing.

Verify the identity of developers

Posted Mar 30, 2024 15:39 UTC (Sat) by marcH (subscriber, #57642) [Link] (5 responses)

> ... and accepting the fact that we will never have enough reviewers to check every commit, ...

This is the main problem right here and the "solution" is to just SLOW DOWN. No xz maintainer? Fine, no xz software at all. The world will keep spinning.

This site is never short of very valid criticism of greedy corporations exploiting exhausted maintainers. But sometimes it would also be good to look in the mirror and at the attitude of some maintainers. Scratching your own itch and sharing your random and insecure musings on the Internet is great. But when that pet project becomes popular hubris kicks in. Then comes the desire to become popular and successful too and to avoid a fork at all costs: someone else more "active" and more lax could fork and reap all the fame! Quick, quick, let's merge all these great new features that I have barely the time to look at. Who knows: someone somewhere could be interested in them?

The number 1 job of a good maintainer is to say "no" or even (the horror!) completely ignore some ideas and submissions. The Linux kernel is not perfect but actually decent at this. The rest of what runs on a Linux desktop... not so much.

It's not just corporations: many consumers, developers and generally _people_ generally don't want to pay for quality. We just want more and more for less and less money. With just one exception: security. Because a device that grants criminals access to your data and passwords is much scarier than a device that stops working. So security has been saving quality and it will keep slowing things down. That's a good thing.

Verify the identity of developers

Posted Mar 30, 2024 15:43 UTC (Sat) by marcH (subscriber, #57642) [Link]

> don't want to pay for quality. [...] With just one exception: security.

Sorry I forgot the other, obvious exception: dying. For instance: in a Boeing plane.

Verify the identity of developers

Posted Mar 30, 2024 15:53 UTC (Sat) by pizza (subscriber, #46) [Link] (3 responses)

> This is the main problem right here and the "solution" is to just SLOW DOWN. No xz maintainer? Fine, no xz software at all. The world will keep spinning.

Ok, so widely-user xz stops being maintained. The amount of collective effort to remove xz from production dwarfs the effort of maintaining xz. By multiple orders of magnitude.

Meanwhile, as this mess demonstrates, "maintained" doesn't mean it can be trusted.

Verify the identity of developers

Posted Mar 30, 2024 17:04 UTC (Sat) by marcH (subscriber, #57642) [Link] (2 responses)

> Ok, so widely-user xz stops being maintained.

This already happened a while ago but that wasn't my main point. My main point is: how was 20% better compression enough to "lure" projects into adding some bad dependency? _This_ is when projects should "slow down" and take the time to weight all the pros and cons, including future maintenance headaches
https://cacm.acm.org/practice/surviving-software-dependen...

> The amount of collective effort to remove xz from production dwarfs the effort of maintaining xz. By multiple orders of magnitude.

This is comparing apples and oranges because it's not the same people. There's no "collective" either: each project should manage its dependencies independently.

I'm surprised that switching to a different compression is so expensive because many projects offer such a choice at configuration time which hints at comparable APIs.

> "maintained" doesn't mean it can be trusted.

Indeed but "unmaintained" cannot be trusted _for sure_. Same as testing: it can only "prove the existence of bugs, not their absence". Quality is not black or white.

> Meanwhile, as this mess demonstrates,

If anything, this mess demonstrated a lack of maintenance.

Verify the identity of developers

Posted Mar 31, 2024 21:19 UTC (Sun) by calumapplepie (guest, #143655) [Link] (1 responses)

> My main point is: how was 20% better compression enough to "lure" projects into adding some bad dependency?

Because it wasn't a bad dependency, and 20% is a big improvement. xz has (had?) a website, multiple contributors, maintainers, and more. There were heated debates over the advantages of it compared to other tools. Eventually, it gained wide use, because 20% faster downloads, 20% less storage, 20% lower bandwidth fees, and a 20% better algorithm was an upgrade.

> This is comparing apples and oranges because it's not the same people. There's no "collective" either: each project should manage its dependencies independently.

Then we will have 60% of projects continue using XZ for years after it is abandoned. Besides, the idea of projects as a bunch of little silos that all know each other and work together isn't realistic; lots of contributions are drive-by, and lots of contributors are involved in many projects.

> If anything, this mess demonstrated a lack of maintenance.

XZ was and is maintained. It was subjected to continuous fuzzing as a part of Googles OSS-fuzz. It was vendored into the Linux kernel, an "essential" part of Debian, and more. There isn't a maintainer on the planet who would have rejected its inclusion because of reputational fears.

Do we need more auditing? Yes. Maybe some clever person can think of linker patches that will catch this particular exploit technique, or maybe we need to have more people running with the options that exist to catch it, or maybe even someone should write some automated tests. But rogue maintainers was and will remain a problem.

Verify the identity of developers

Posted Apr 1, 2024 0:04 UTC (Mon) by marcH (subscriber, #57642) [Link]

> xz has (had?) a website, multiple contributors, maintainers, and more. [...] XZ was and is maintained.

Did you read the news?

> Then we will have 60% of projects continue using XZ for years after it is abandoned. Besides, the idea of projects as a bunch of little silos that all know each other and work together isn't realistic; lots of contributions are drive-by, and lots of contributors are involved in many projects.

I don't get the point you're trying to make here.

In any case adding and managing dependencies is a benefit-risk assessment and it's up to each project to make its own decisions in a decentralized manner; there is simply no alternative besides not making this assessment.

"Decentralized" does not mean "in a vacuum": more popular dependencies are of course more likely to be picked up or forked by someone who really needs them if the current maintainer(s) fail.

Verify the identity of developers

Posted Mar 30, 2024 20:29 UTC (Sat) by pawel44 (guest, #162008) [Link]

> Then for key roles you could require someone with a clear track record and resident in a Western country where they could > be prosecuted for maliciously introducing backdoors.

Was NSA prosecuted?

Verify the identity of developers

Posted Mar 31, 2024 11:05 UTC (Sun) by dkzm (guest, #55549) [Link] (53 responses)

Let's suppose somebody add drivers for yet another Chinese SBC or making QEMU's COLO feature (VMware FT equivalent, at least partially developed by Huawei as far as I understood) really working or just new and improved quantization format for LLMs to textgenweb ui. Or just some new improvments to Linux network stack (Huawei did this as far as I remember)

They are not from Western country (they are likely from China). Shouldn't github just refuse their access? They just use their own services, blame it on github being unwelcoming to them. West will not be able to use and learn their could and they would still be able to do so. This likely mean that all major opensources projects will be forked and it it's possible that github 'original' will not be most maintained branch anymore.

Verify the identity of developers

Posted Mar 31, 2024 17:53 UTC (Sun) by epa (subscriber, #39769) [Link] (52 responses)

I didn't mean blocking anyone from gitlab or even blocking them from committing. I meant given a particular git repository being able to see the people who have committed to it, and whether their identity has been verified as a real person. Whether to then "trust" that repository is a policy decision and would be made by distributors or perhaps your company's IT department. People are already making these decisions; given that we face attempts to introduce backdoors under fake identities like "Jia Tan", having some extra information about whose identity has been verified might help make better decisions.

Verify the identity of developers

Posted Mar 31, 2024 18:12 UTC (Sun) by pizza (subscriber, #46) [Link] (51 responses)

> given that we face attempts to introduce backdoors under fake identities like "Jia Tan", having some extra information about whose identity has been verified might help make better decisions.

What does "verified" mean in this context?

Verify the identity of developers

Posted Apr 1, 2024 6:31 UTC (Mon) by epa (subscriber, #39769) [Link] (50 responses)

I was imagining a future where GitLab verifies the identity of people using an identity document such as a passport. Of course, that’s currently not practical. It would require governments to issue electronic identities by signing someone’s private key or the like. Which could happen in Estonia but for various reasons seems out of reach in the USA. Actually, GitLab need not be involved, if you can tie back a private key to a named individual and commits are signed.

And I did say it would be somewhat dystopian and not desired by free software developers.

It would not be watertight but would make things a bit more difficult for an organization that wants to get maintainer permissions in a project and add malicious code. Currently, creating a fake identity is trivially easy.

Verify the identity of developers

Posted Apr 1, 2024 6:45 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (49 responses)

If we're in a world where we're dealing with state level attackers we should assume that they're going to be able to produce ID that's good enough to pass any viable non-government checks, and so you end up with something that deters legitimate contributors without preventing the worst case failures.

Verify the identity of developers

Posted Apr 1, 2024 7:08 UTC (Mon) by himi (subscriber, #340) [Link] (48 responses)

Yes and no.

Yes, nation state attackers have the capacity to forge ID that will pass any viable checks (non-government or government) - history has given us many examples. But, when those nation states have been caught out doing so it has the potential to result in the kind of negative impacts that nation states actually take seriously - diplomatic incidents, visa refusals or revocations, even changes to policies on visa applications. Sure, true rogue states probably don't care that much about that kind of thing, but anyone that doesn't want to be /considered/ a rogue state by most of the world will care.

Not that I think this is a good argument for requiring real-world ID to be able to make legitimate contributions to free software projects, but it's definitely worth remembering that when real-world ID gets involved the consequences for a lot of things, both good /and/ bad, escalate rather quickly.

Verify the identity of developers

Posted Apr 1, 2024 7:22 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (47 responses)

Mossad agents used fake Canadian passports in an assassination, and the meaningful impact on their international reputation was approximately zero. And it was only possible to determine that because there were alternative pathways to identify who the agents were, what their legitimate nationality was, and how they had entered the country anyway. Without building an incredible amount of infrastructure there's no real way to flag an identity as false until they've already done something illegitimate, and even then if you can't identify the underlying identity you're probably not going to be able to figure out who issued the false identity. Unless you want every contribution to be subject to the same criteria as entering a foreign country, I just don't think this is a meaningful speedbump.

Verify the identity of developers

Posted Apr 3, 2024 8:04 UTC (Wed) by epa (subscriber, #39769) [Link] (46 responses)

The fake Canadian passports must have worked because immigration officers just looked at them. They didn’t check them against a database maintained by the Canadian government. I am not completely certain, but I know that in Europe a passport is not just eyeballed but *scanned* using its RFID chip, or at least the computer-readable text printed on it. And within the European Union at least, it gets checked against a database. So you can’t just print one that looks convincing.

(Unless of course the Mossad operatives took on the identity of real Canadians and looked like them too, so they could in principle have stolen those people’s real passports and used them to travel. I am have been assuming they created entirely fake identities.)

How does this relate to checking the identity of programmers? Well, a requirement of “please take a photo of your passport” wouldn’t cut it. But if governments provide a way to validate the passport data (or for an individual to generate a code which can be checked on a govt website, as already happens in some countries) then we do make it harder to create fake identities. A step further would be for an individual to have their public key signed by a government agency. You could at least check the person’s real name and nationality. That’s a lot better than “Jia Tan”.

Verify the identity of developers

Posted Apr 3, 2024 12:53 UTC (Wed) by smurf (subscriber, #17840) [Link]

> if governments provide a way to validate the passport data

Not that difficult to do given reasonably useable infrastructure. In Germany you don't even need direct support from the government to do it (other than having a RFID chip in your document in the first place of course): there's an online service from Governikus that transmogrifies the electronic data in your ID card to a GPG signature. See e.g. mine at 72CF8E5E25B4C293, signed by 5E5CCCB4A4BF43D7.

Verify the identity of developers

Posted Apr 4, 2024 14:17 UTC (Thu) by kleptog (subscriber, #1183) [Link] (44 responses)

Apps that can validate your passport via the NFC chip are fairly common these days. You can open a bank account online by holding your phone to your passport, and then letting it take a picture of your face. It's all held together with public key cryptography. I'm not sure if the CRL for passports is publically available, but it does exist.

Maybe not in the US though?

But that doesn't really help though, because all it proves is that someone has access to a passport. That doesn't tell you if they're trustworthy or not. And since you're not going to be scanning someone elses passport in the end you're going to be trusting who ever did that the person was actually there, so you've just moved the problem around.

In the end you can never be 100% sure someone is trustworthy. No technical solution can do that. But perhaps we can make a machine learning system that can monitor commits for dodgy stuff to make it feasible for humans to focus on the risky patches.

Verify the identity of developers

Posted Apr 4, 2024 18:59 UTC (Thu) by epa (subscriber, #39769) [Link]

If the developer’s public key is signed by a government agency and linked to their identity document (as apparently can be done in Germany) that is a stronger check than just checking a passport and associating it with a public key uploaded separately.

None of this is completely watertight. But right now it’s kind of embarrassing how easy it is to create a fake identity and use it to contribute or even become maintainer of a project.

Verify the identity of developers

Posted Apr 4, 2024 19:33 UTC (Thu) by draco (subscriber, #1792) [Link] (42 responses)

Passports establish identity and jurisdiction (for extradition if necessary) for claims of torts and charges of crimes.

It's not about whether they can be trusted, but about whether they can be held accountable if they do something bad.

By comparison, as we've seen, an email, public key, and purported name are nearly useless for these purposes.

(It hopefully goes without saying that these all need to be tied together to one person to be useful, but it was apparently not obvious why a passport helps, so I figured I'd be upfront with that detail.)

Verify the identity of developers

Posted Apr 4, 2024 19:37 UTC (Thu) by mss (subscriber, #138799) [Link]

Passports establish identity and jurisdiction (for extradition if necessary) for claims of torts and charges of crimes.

It's not about whether they can be trusted, but about whether they can be held accountable if they do something bad.

That's not very useful for a nation-state-sponsored attack like (most likely) this one.

Verify the identity of developers

Posted Apr 4, 2024 19:43 UTC (Thu) by pizza (subscriber, #46) [Link] (39 responses)

> Passports establish identity and jurisdiction (for extradition if necessary) for claims of torts and charges of crimes.

Then there's the little problem that most people don't have passports.

So you'd need to handle (and have a way of validating) nearly 200 (non-standarized) national identity documents. Then there are countries like the US that don't have a single (domestic-focused) nation-wide ID [1].

[1] eg the USA, with 50 states, plenty of non-state territories (eg DC and Puerto Rico) and various native tribal IDs, plus military and other federally-issued IDs, and so forth...)

Verify the identity of developers

Posted Apr 5, 2024 9:46 UTC (Fri) by smurf (subscriber, #17840) [Link] (38 responses)

> Then there's the little problem that most people don't have passports.

That's likely true if you talk about the world's population as an aggregate.

The actual rate varies rather wildly between countries, and their subpopulation (I'd assume that IT affine people are somewhat more likely to have one than not, for instance).

Verify the identity of developers

Posted Apr 5, 2024 12:20 UTC (Fri) by pizza (subscriber, #46) [Link] (37 responses)

> The actual rate varies rather wildly between countries, and their subpopulation (I'd assume that IT affine people are somewhat more likely to have one than not, for instance).

You're probably right, but I'd still be surprised if a majority "IT people" in the US have a passport.

(And EU citizens don't need a passport to travel within the EU either)

Verify the identity of developers

Posted Apr 5, 2024 12:55 UTC (Fri) by atnot (subscriber, #124910) [Link] (9 responses)

This entire discussion is just moot in the first place, because if some project requires me to scan my passport to contribute, I will simply not contribute to that project. And experience show few people will. *Especially* if that means you take on some sort of legal liability for your contributions, which just, hell no.

Look at CLAs, most people already can't be bothered to e-sign some document in docusign or whatever.

And it's the wrong direction to discuss this anyway. Sure, verifying passports would be a way to verify that people contributing to *your* project are people recognized by some sort of UN government (lots of people aren't, but let's put that aside). But if, say, sqlite, gcc, freebsd, linux or whatever organizations your systems depend on isn't going to enforce your draconian policy, what are you going to do? Or if they do, and a fork develops that just lets people contribute without any riffraff? Are you going to not package their software and everything that depends on it? Rewrite the whole thing from scratch? You can't trust any of their commits after all. Put the stuff without passport checks in a seperate repository, which everyone just enables blindly because that's what you actually need to do to get real work done, just like people already do with rpmfusion and universe and flatpak and pypi and everything else?

It's just a completely unrealistic model of free software development that assumes a "supply chain" and an avenue for contractual obligations that just does not exist, cannot exist and is deeply undesired by all of the people this industry runs on, those who publish their code online because it brings them joy.

Verify the identity of developers

Posted Apr 5, 2024 13:43 UTC (Fri) by farnz (subscriber, #17727) [Link]

… assumes a "supply chain" and an avenue for contractual obligations that just does not exist, cannot exist and is deeply undesired by all of the people this industry runs on, those who publish their code online because it brings them joy.

This is the key point; if you're going to "solve" this problem, you need to start at the producer end, since the consumers of Free Software have no leverage over the producers in general (you may have leverage in specific cases - say if you employ a producer of Free Software and can threaten their livelihood - but not over the full sum of Free Software).

If you can't come up with a good reason why you'd jump through the hoops you're putting in place to fix a typo in a message the program displays (say changing "the" to "The" because of context), then your hoops are not going to work in general, since there will be plenty of producers of best-in-class Free Software who refuse to jump through your hoops.

Verify the identity of developers

Posted Apr 6, 2024 13:14 UTC (Sat) by smurf (subscriber, #17840) [Link] (7 responses)

> Look at CLAs, most people already can't be bothered to e-sign some document in docusign or whatever.

The problem with CLAs isn't that I can't be bothered.

The problem is that assigning my copyright, or the rights thereof (you can't "assign copyright" in some jurisdictions; you created it, you have the copyright, period end of discussion, presuming you didn't do it for an employer) is a very bad idea because it allows the transferee to re-license the work under any proprietary legalese they damn well please. Numerous examples can readily be found in the archives, of LWN and elsewhere.

Showing my passport / ID document to somebody doesn't take away any of my rights.

XZ and of course a whole freakin' lot of other software is the equivalent of critical infrastructure. In most countries, if I want to hire you to work on anything critical, you showing me some official ID document is just the first step in a rather long list of intrusive government snoopage, depending on quite how critical the piece you'd then be able to subvert is; including but *way* not limited to checking that you don't have relatives in $BAD_COUNTRY whose health would be a convenient handle their government might blackmail you with.

You don't want that? fine, go work somewhere else.

Verify the identity of developers

Posted Apr 6, 2024 16:42 UTC (Sat) by atnot (subscriber, #124910) [Link] (3 responses)

Okay, I'll work on something else. And then it will become useful, and lots of people will want to
install it on your distro. And then it ends up becoming critical. What are you gonna do then?

You can say how it would work if this was a company again and again. But this isn't a company. It very explicity and deliberately does not work like a company.
You can't just handwave some sort of contractual customer relationship between someone uploading code on the internet and other people chosing to use it in critical ways of their own accord, that's just not how things work.

Verify the identity of developers

Posted Apr 6, 2024 18:14 UTC (Sat) by smurf (subscriber, #17840) [Link] (2 responses)

Surprise: I know that.

> You can say how it would work if this was a company again and again.

I don't recall saying anything more than once.

Also I didn't say that "this", whatever it is, should work like a company. Or that I'm advocating for doing things that way.

All I'm saying is that compared to the security (both real and theater) you're subjected to when working on "this" in a corporate context, requesting something that links your online identity with what most people consider to be the Real World isn't *that* much of a burden.

Given this attack, the idea of finding some middle ground between "you're a $NATION black hat? sure, no prob, here are the keys" and the (IMHO somewhat excessive) hoops the corporate world requires you to jump through when you want to do the exact same thing for $$$ isn't *that* far out.

So we get to talk about it.

There's a material difference between discussing ways to ID people working on critical code and concluding that it's not practical and finding some other way to reach the same goal (clean up our tooling, pay somebody to do code reviews, whatever) and declaring a priori that the topic is not up for discussion because "that's just not how things work".

Verify the identity of developers

Posted Apr 6, 2024 18:26 UTC (Sat) by mjg59 (subscriber, #23239) [Link] (1 responses)

Linking "online identity" to "real world identity" is a great way to dissuade a significant number of people from participating in free software, and at this point we have no evidence whatsoever it would have done anything to help in the case in question.

Verify the identity of developers

Posted Apr 6, 2024 19:06 UTC (Sat) by Wol (subscriber, #4433) [Link]

And what smurf is conveniently forgetting is that my pet project may or may not be critical to me. If it's critical to someone else - NOT MY PROBLEM!

"We get to talk about it". And the FIRST thing I'm going to talk about is £££. At which point if you don't want to pay - or I don't want the money! - we're at an impasse.

At the end of the day, there has to be a MUTUAL EXCHANGE OF VALUE. And smurf is assuming he has something of value to offer - BAD ASSUMPTION! I don't know about other people, but as far as I'm concerned, if it involves dealing with the US Authorities, my price is likely to be "Up Yours!!!"

"So we get to talk about it." "Feel free to fork it. I don't care".

Cheers,
Wol

Verify the identity of developers

Posted Apr 6, 2024 16:53 UTC (Sat) by farnz (subscriber, #17727) [Link] (2 responses)

When xz started, and indeed when most of the open source that's now "critical infrastructure" started, it was just a hobby project, and not critical. It became critical because it was useful and became used; but that's on the users, not the developers.

Or are you saying that I'm allowed to demand that you go through a very long list of government snoopage because I've used your comment in something critical, and you now owe me big time for my decision to make use of your work?

Verify the identity of developers

Posted Apr 7, 2024 2:35 UTC (Sun) by draco (subscriber, #1792) [Link] (1 responses)

No, I think it's the other way around

Perhaps I don't want the reputational damage of having nation state attacks on my project, so I insist on knowing that the patches I accept are from real, identifiable people from countries I trust

Maybe nobody contributes to my project, maybe I'm ok with that, maybe some people feel better about my project because of that policy

Maybe people who don't like that choose to fork it, that's their right, but then they accept the consequences

Or maybe they do the same thing, but with different trust decisions about who's ok 😂🤷

A variant of this has happened before: DJB is very opinionated about what goes into his software

Is this a good approach? The proof won't be in any arguments about it, but in what actually happens

Verify the identity of developers

Posted Apr 7, 2024 11:17 UTC (Sun) by farnz (subscriber, #17727) [Link]

But what if you yourself are a nation state attacker? How do I know when I look at something and consider using it that you're trustworthy? How do I as a potential user get you to jump through my hoops that confirm that you are a real, identifiable person from a country I trust?

And remember that for a lot of contributions, I can see that they're safe by review; why would I demand anything from a contributor when it's obvious to me that the change is good as-is? For code where I can't completely review it, I need some degree of trust, but where I can review in full, why would I put you through a barrage of trust checks just to go 'yep, I can see that changing "correct. the system" to "correct. The system" is a good change to make'?

Verify the identity of developers

Posted Apr 5, 2024 12:55 UTC (Fri) by paulj (subscriber, #341) [Link] (26 responses)

EU citizens *do* need some form of official ID (e.g. passport, driving licence, national ID card) to travel between at least some of the EU.

Citizens (and possibly also residents) do not need ID to travel within the Schengen Area. Now, the Schengen Area includes nearly all EU members, but not all. Notably, the 2 island member states are not in Schengen, Ireland and Malta - Ireland can not join Schengen because the UK has never wanted to join, and Ireland has always had an open border with the UK, and will have for the foreseeable future. Malta, not sure why, but perhaps that's also to do with UK relations - however it will be joining at some point soon. Additionally, the EFTA states, and a couple of others, are also in Schengen - but not in EU.

Verify the identity of developers

Posted Apr 5, 2024 12:59 UTC (Fri) by paulj (subscriber, #341) [Link]

tl;dr: You need an ID to travel to/from Ireland ;)

Cause of the Brits.

Verify the identity of developers

Posted Apr 5, 2024 14:36 UTC (Fri) by jem (subscriber, #24231) [Link] (24 responses)

>Citizens (and possibly also residents) do not need ID to travel within the Schengen Area.

This is a common misconception. EU citizens are required to carry a government issued ID card or passport if they are traveling abroad, even if the travel is limited within the Schengen area. However, the ID is normally not checked at the border between two Schengen countries, but checks can be reinstated if circumstances require it.

In some countries within the Schengen area (*cough* Germany) citizens are required to be in possession of an ID card (or passport) even within their own country.

Verify the identity of developers

Posted Apr 5, 2024 14:56 UTC (Fri) by rschroev (subscriber, #4164) [Link]

> In some countries within the Schengen area (*cough* Germany) citizens are required to be in possession of an ID card (or passport) even within their own country.

Same in Belgium. In practice in everyday life in most situations this is not enforced, but police enforcement can ask for your ID (I think they have to state why they do) when they feel you're causing trouble.

Verify the identity of developers

Posted Apr 5, 2024 18:36 UTC (Fri) by pizza (subscriber, #46) [Link] (21 responses)

My point was that as a EU citizen you don't need a *passport* to travel around the EU; your national ID card is sufficient. Only when traveling outside the EU would a passport come into play.

What makes passports semi-feasible for "identity verification" is that there is a true international standard for machine readability (and decoding) of their information. But passports are not a given, meaning you'd realistically need to accept various [sub-]national ID cards in all their infinite diversity, with potentially a separate reading/decoding/verification mechanism required for each issuing agency.

Verify the identity of developers

Posted Apr 6, 2024 21:31 UTC (Sat) by kleptog (subscriber, #1183) [Link] (20 responses)

It worth noting that from a technical aspect there's less difference between passports and ID cards than you might think. National ID cards in europe often have the same NFC chip as the passports and they work the same way. The only real reason you need a passport outside the EU is because many countries want to (a) put stamps in your passport, (b) add physical visas to them and (c) be able to look at your passport to see if you've visited any countries they don't like. That's why countries near the EU where there are no visa requirements will accept national ID cards at immigration. (The Dutch ID card is acceptable for travel to 42 countries.)

As digital visas become more common there will be more countries that will accept ID cards in lieu of passports.

It's possible to use all this infrastructure in positive ways. For example, it would allow Github to have proof you are over 18 and a resident of country X, without revealing any other information about you (zero-knowledge proofs). We're not there yet.

Of course the next step is to ditch the physical card altogether and have it all in your phone instead. Of course that gets trickier, because a passport/ID card provides offline unrecorded proof of validity, but it's not clear if a pure digital app based identity can work offline.

Verify the identity of developers

Posted Apr 6, 2024 22:05 UTC (Sat) by pizza (subscriber, #46) [Link] (19 responses)

>It's worth noting that from a technical aspect there's less difference between passports and ID cards than you might think. National ID cards in europe often have the same NFC chip as the passports and they work the same way.

...Well, that's great for Europeans, but what about the rest of the world?

(again, that's my point -- Passports have an international standard for machine readability and interoperability, but there are more standards for domestic identification than there are countries! I have a US passport, a second Federal-issued ID, and a state-issued ID. They are all machine-readable, but via different mechanisms, and the encoded information also differs. The passport establishes citizenship. The state ID establishes residency and permission to operate a motor vehicle on public roads)

Verify the identity of developers

Posted Apr 7, 2024 15:53 UTC (Sun) by kleptog (subscriber, #1183) [Link] (18 responses)

> They are all machine-readable, but via different mechanisms, and the encoded information also differs.

Ok, but this is a fabricated problem. The states of the US could surely get together and adopt a single standard to cover everything. Clearly it's not a big enough problem.

If your point is that requiring digital identification online for open-source projects would unfairly exclude much of the world, I agree with you. That's not something we can reasonably require at this point (perhaps ever).

Verify the identity of developers

Posted Apr 7, 2024 17:00 UTC (Sun) by pizza (subscriber, #46) [Link] (17 responses)

> Ok, but this is a fabricated problem. The states of the US could surely get together and adopt a single standard to cover everything. Clearly it's not a big enough problem.

There is a federal standard for state ID cards now (imposed by the "REAL ID Act" but twenty years later it's still not fully deployed, and IIRC that _still_ doesn't provide a standard mechanism for machine readability or verification.

It's "not a big enough problem" because these ID cards are only used physically, in person, using the mk-I eyeball to make sure the photo vaguely looks like the person holding it.

> If your point is that requiring digital identification online for open-source projects would unfairly exclude much of the world, I agree with you. That's not something we can reasonably require at this point (perhaps ever).

Yes, except it's not "much of the world" so much as "everyone that doesn't live in a jurisdiction that provides state-issued digital identification along with a low/zero-cost mechanism for arbitrary third parties (including those outside your jurisdiction) to validate said credentials." IIUC hardly anywhere qualifies in that respect.

Verify the identity of developers

Posted Apr 7, 2024 17:46 UTC (Sun) by pizza (subscriber, #46) [Link]

> and IIRC that _still_ doesn't provide a standard mechanism for machine readability or verification.

Whoops, I stand corrected. It wasn't part of the original law, but instead as regulations issued by the DHS after the fact. So, currently REAL-ID compliant cards must have a PDF417 2D bar code containing a minimum of 10 data elements [1]. Notably missing is a digital signature that one can use to _validate_ the data without some sort of query to the issuing authority, so absent that query, these ID cards are only useful for in-person stuff since you can photoshop anything you want onto the front (photo, text) and back (barcode) and nobody would be any the wiser.

(Nearly all of the REAL-ID provisions have to do with physical/anti-tamper security (eg watermarks, holograms) and a consistent minimum standard for documentation needed to issue said ID, and the information that needs to be shown..)

(Meanwhile, various federal agencies (including the military) have their own ID standards that use different machine readable mechanisms and encoded data..)

[1] legal name, gender, DOB, address of residence, etc. See https://www.law.cornell.edu/cfr/text/6/37.19

Verify the identity of developers

Posted Apr 7, 2024 20:51 UTC (Sun) by Wol (subscriber, #4433) [Link] (15 responses)

> Yes, except it's not "much of the world" so much as "everyone that doesn't live in a jurisdiction that provides state-issued digital identification along with a low/zero-cost mechanism for arbitrary third parties (including those outside your jurisdiction) to validate said credentials." IIUC hardly anywhere qualifies in that respect.

In the UK, it certainly isn't mandatory. The ONLY piece of ID that all British Nationals can be reasonably assumed to possess is a birth certificate. That's assuming their parents registered the birth. Anything beyond that is OPTIONAL, although living without it can be hard. If you haven't had to renew your driving licence for one reason or another, the old green paper version is still valid. There probably aren't that many left, though. My passport is not a proper biometric one (it's also no longer valid), but if anybody wants a passport for ID I would quite happily present it and say "if that's not good enough, it's the best I've got".

More and more, if people demand things off of me (mobile phone number especially), I just walk away ...

Cheers,
Wol

Verify the identity of developers

Posted Apr 8, 2024 21:31 UTC (Mon) by kleptog (subscriber, #1183) [Link] (14 responses)

> if anybody wants a passport for ID I would quite happily present it and say "if that's not good enough, it's the best I've got".

I hope it is for you too. The UK has a weird view on IDs. On the one hand they recognise the benefits, on the other hand whenever it's proposed they always talk about being required to carry them at all times. Which is basically insane and a way to sink the topic before it gets anywhere.

A national ID is physical proof you are allowed to be there and have certain rights. So if some government database has a glitch and suddenly decides you're an illegal immigrant (e.g. Windrush, EUSS, the current PCDP scandal at the Home Office) you have physical proof that the database is *wrong*. Good for preventing you getting deported. That such a card is useful in other contexts is bonus.

From a pure practical point, my bank can assert my nationality just as well as the government can. You don't necessarily need passports/ID cards for that.

Verify the identity of developers

Posted Apr 9, 2024 9:02 UTC (Tue) by farnz (subscriber, #17727) [Link] (13 responses)

One of the issues in the UK with national ID cards is that whenever the idea comes up, the intent is to fund the cards via other uses of the data contained therein. Being required to carry them at all times is just a consequence of the idea that national ID cards need to turn a profit for the government.

Verify the identity of developers

Posted Apr 9, 2024 9:53 UTC (Tue) by Wol (subscriber, #4433) [Link] (12 responses)

One of the other issues we have is that the UK is not a single nation - and it's dominated by the "little englanders".

Has anybody else noticed that - of the four nations - England is the only one without its own National Anthem?

Driven home when watching the Calcutta Cup - the Scots sing "Flower of Scotland", but the English sing "God Save the (Scottish) King" !!!

It's the same problem the Canadians and Mexicans have with North America / USA, and the English seem completely oblivious to it ...

Cheers,
Wol

Verify the identity of developers

Posted Apr 9, 2024 15:06 UTC (Tue) by rschroev (subscriber, #4164) [Link] (6 responses)

England is also the only one without its own parliament. It's almost as if England still feels the whole UK is theirs, with the other members subordinate instead of them all being on the same level.

Verify the identity of developers

Posted Apr 10, 2024 10:01 UTC (Wed) by paulj (subscriber, #341) [Link] (5 responses)

An artifact of how the UK avoids properly addressing its constitutional issues, instead sating discontent in the non-English parts by a series of more ad-hoc "devolution" of powers from Westminster to other parliaments, seeking to react to events with the minimum of change.

It started with Ireland, which got a devolved government and dominion status within the UK in 1922 with some powers reserved for Westminster and an "Executive Council" (similarish to the privy council), until 1931 when Ireland became a wholly autonomous dominion, and then to 1937 as "Eire" a self-declared independent state (dominion status ambiguous), and formally as the Republic of Ireland from 1948.

Scotland and Wales got their own devolution in 1999, both more proscribed than the original Irish Free State (which had taken armed insurrection), but each with continued representation via MPs in Westminster. I'm not sure about the differences in power between them. The Scottish parliament seems to me to have more "status" and power than the Welsh one, but that might just be my bias, having lived in Scotland - I don't know much about Welsh devolution and how it compares.

It seems to me Ireland has the healthier status of the 4 "home nations", as they were (ignoring the Troubles, arising from Elizabethan and Jacobean era United Kingdom politics, which due to historical quirks left a longer, stronger imprint in the north of Ireland than the rest of the UK). Ireland continues to have very strong bilateral links to the rest of the Celtic Isles - Irish and British citizens travel and settle freely between them, trade a little less so now thanks to BrExit though, there are bilateral institutions, etc. - while Ireland is ultimately able to decide its own fate.

I don't understand why Scotland, if not Wales, doesn't also seek a similar situation. Would be better for all in the end I think. (I did vote "Yes" in the IndyRef in Scotland. ;) ).

Verify the identity of developers

Posted Apr 10, 2024 11:23 UTC (Wed) by Wol (subscriber, #4433) [Link] (4 responses)

> Scotland and Wales got their own devolution in 1999, both more proscribed than the original Irish Free State (which had taken armed insurrection), but each with continued representation via MPs in Westminster. I'm not sure about the differences in power between them. The Scottish parliament seems to me to have more "status" and power than the Welsh one, but that might just be my bias, having lived in Scotland - I don't know much about Welsh devolution and how it compares.

Being interested in history, I think this goes back to the fact that England and Scotland were two separate nations (let's forget the Flower of Scotland Proud Edward's Army bit) until very recently. Until William's intervention in 1066, the assorted British nations were steadily coalescing of their own accord, take for example the agreement round about 900AD between - iirc - Mercia, Northumbria and Wessex that all three crowns would pass to whichever King survived longest.

Then William arrived and upset the applecart, setting out to unite the British Isles by force. Out of proto-England, Wales held out the longest which forged a separate nationality (quite possibly helped by the fact that the Anglo-Saxon nations fell rather more easily, the Welsh being Celtic so already feeling different). But Wales has always been part of "England" since the mid 1100s (and sort-of took over the English crown with Owain Tudor about 1500).

Scotland has always had a separate identity - again being Gaelic rather than Anglo-Saxon (although the Sassenachs are "Lowland Scots" aka Angles"). Again fuelled by constant conflict with the Normans to the south. And with their own monarchy (which Wales never had?) since pre-William - again going back to 900s and earlier - which left alone would probably have merged with England using a similar mechanism. But it wasn't to be.

So Scotland was either occupied, or completely independent, until the "Union of the Crowns" in 1603. It remained an independent (theoretically) country until about 1750 and the "Act of Union".

So basically, Scotland has more power and independence because Scotland is considered a nation/country. Wales is just a subordinate principality.

(And personally, I think Westminster has far too much power. A lot of it should be devolved to local government. But it's the standard ebb and flow of politics unfortunately - the centre grabs power, messes it up, and the regions grab it back. Rinse and repeat :-(

Cheers,
Wol

Verify the identity of developers

Posted Apr 10, 2024 12:59 UTC (Wed) by paulj (subscriber, #341) [Link] (3 responses)

Interesting... ;)

As an aside, I note your view of the history seems skewed towards the countries /currently/ part of the UK. You can't understand the history of these Celtic Isles without understanding the history of one of the larger chunks of it, and a kingdom of the king of England for longer than Scotland - Ireland. Some of the biggest battles relevant to the history of the kingdom of England (and to the history of Europe, to a certain extent) were fought in... Ireland (by soldiers from many nations).

Just saying, cause a lot of modern British seem to overlook it - just cause Ireland is no longer part of the UK.

Verify the identity of developers

Posted Apr 10, 2024 15:04 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

Agreed I don't know an awful lot about Ireland. Bear in mind I consider myself European/Scottish although my wife insists I'm English, so that accounts for at least some of the bias.

I also know there's an awful lot of history roundabout the time of Cromwell and Cromwell :- ) that's Thomas Cromwell of Henry VIII fame for the first one :-) but I know very little about it, other than it was the age-old Catholic/Protestant mess. (And quite likely earlier, too.)

The other thing that often gets forgotten about medieval history is the "Joan of Arc vs the English" lie. Okay, Joan is a bit later than this, but King John (of Magna Carta fame, 1215) is probably the first true "King of England". Before that, and including his elder brother Richard, the title of Duke of Normandy actually ranked ABOVE the King of England. Richard's troubles in the Crusades basically brought about the downfall of the Norman Empire, and Joan drove the Normans out of Normandy (probably a gross mis-representation of what actually happened, but rather more accurate than folk history!)

Cheers,
Wol

Verify the identity of developers

Posted Apr 11, 2024 9:12 UTC (Thu) by paulj (subscriber, #341) [Link]

King John possibly succeeded - despite himself - in part thanks to having Ireland to draw wealth from. Without Ireland, he'd have had nothing early on ("John Lackland" - John no-land), and would have been less wealthy later. He might have struggled to hold the English crown against his nephew Arthur and Philip II of France.

Verify the identity of developers

Posted Apr 10, 2024 15:19 UTC (Wed) by Wol (subscriber, #4433) [Link]

Just to throw in another snippet, to help explain the Saxon / William thing - Saxon kings were elected. William's pretence for invasion was that he had been promised the crown, which was half true, but it was never in the gift of the promissors.

And that's how the treaty between the three kings worked - the ruling councils basically signed up that the only eligible candidates for any vacant monarchy would be the other monarchs. All helped by the fact that the crowns did NOT pass father to son, although the only real candidates were all close relatives of the late King.

Indeed, George II may have been the first King to inherit as of legal right, given the shenanigans in the aftermath of Henry VIII and Edward VI, and the similar shenanigans over James II, William and Mary, and Anne. Indeed, after the death of his wife, William III ruled alone despite not being of (British) Royal Blood at all! Using him as precedent, we should have had King Albert, and King Philip! (Although of course, Philip was of British Royal Blood, as also reputedly is Camilla.)

Cheers,
Wol

Verify the identity of developers

Posted Apr 9, 2024 15:39 UTC (Tue) by anselm (subscriber, #2796) [Link]

the Scots sing "Flower of Scotland"

So far that's just a patriotic song popular with Scottish sports fans which various Scottish sports bodies have provisionally adopted in the absence of an actual national anthem (which Scotland doesn't have, either).

Having said that, in spite of its obvious problems Flower of Scotland is apparently a strong contender to become the official national anthem once the Scottish parliament gets its act together. As far as the English are concerned, they should be bothered by the fact that they have no national parliament (or for that matter government) much more than by the formal absence of a national anthem.

Verify the identity of developers

Posted Apr 10, 2024 9:27 UTC (Wed) by paulj (subscriber, #341) [Link] (3 responses)

King Big Ears is German, not Scottish. ;)

Verify the identity of developers

Posted Apr 11, 2024 9:42 UTC (Thu) by Wol (subscriber, #4433) [Link] (2 responses)

Well, he is directly descended from the Scottish King James VI/I ...

(So he's as Scottish as most other people in Scotland :-) which is to say not really at all. Most residents of modern Scotland (a) do not live in the Land of the Scots, and (b) trace their ancestry to either the Picts or the Angles.

(Inasmuch as most people in the British Isles trace their ancestry back to the Anglo-Saxons - genetically we're nearly all Britons, but culturally we're Anglo-Saxon because we adopted the ruling class's language and culture. That's where the word "Welsh" came from - aka "not Anglo Saxon".)

Cheers,
Wol

Verify the identity of developers

Posted Apr 11, 2024 10:17 UTC (Thu) by paulj (subscriber, #341) [Link] (1 responses)

He's descended from the *German* George I and II from /both/ his mother and father :). George I's is also german on /both/ sides, even if his mother was a Stuart - "Sophia of Hanover". Her mother, Elizabeth Stuart was born in Scotland to James VI / I, but her mother was Danish.

It's a bit of a stretch to call Big Ears "Scottish" because, in between the plethora of German ancestors, you can find one couple who were Scottish and Danish a few hundred years ago. ;)

Verify the identity of developers

Posted Apr 11, 2024 11:12 UTC (Thu) by Wol (subscriber, #4433) [Link]

What about Lady Elizabeth Bowes-Lyon?

Cheers,
Wol

Verify the identity of developers

Posted Apr 5, 2024 19:38 UTC (Fri) by gioele (subscriber, #61675) [Link]

> In some countries within the Schengen area (*cough* Germany) citizens are required to be in possession of an ID card (or passport) even within their own country.

Formally, Germany requires you to "possess" an identity document ("verpflichtet, einen gültigen Ausweis zu besitzen" = to have applied for it and to have it somewhere, for example at home) once you are 16. It is not required that you "are in possession" of an identity document (~= to carry with you).

https://www.gesetze-im-internet.de/pauswg/BJNR134610009.h...
https://www.gesetze-im-internet.de/englisch_pauswg/englis...

Verify the identity of developers

Posted Apr 4, 2024 22:47 UTC (Thu) by farnz (subscriber, #17727) [Link]

They don't even do that - all my passport actually does is establish that if someone presents it to you as "their" passport, and you don't believe them, you can hand them over to my country of origin, and my country of origin will arrange my return home (if it's me presenting the passport) and bill me for it later, or will arrest the person, bring them to my country of origin, and then prosecute them for using my passport while not being me, with potential for significant jail time over here.

In particular, for the purposes of holding me accountable for actions, my passport is less valuable than my purported name and e-mail address. If those aren't enough, my passport details are not; my country does not issue ID cards as a matter of course, and I therefore have no documentation that establishes my identity for the purposes of accountability.

Better yet, I'm allowed multiple passports as long as I am not using them to defraud the government. The details on my passport don't have to be my "legal" name, since there is no such thing - they just have to be a name that I use in the course of business. I could get a passport in the name "Linus Benedict Torvalds" quite legitimately, as long as I can show the government that I use that name regularly.

A backdoor in xz

Posted Mar 29, 2024 21:50 UTC (Fri) by dilinger (subscriber, #2867) [Link]

I would certainly welcome this, but boy is that going to be a fun challenge for chromium and other packages that embed chromium..

A backdoor in xz

Posted Mar 29, 2024 22:09 UTC (Fri) by mdeslaur (subscriber, #55004) [Link] (1 responses)

I don't think anyone would have noticed the malicious code even if he did check it into git. In fact, this was easily spotted _because_ the tarball didn't match the git repo.

A backdoor in xz

Posted Mar 30, 2024 7:42 UTC (Sat) by epa (subscriber, #39769) [Link]

I guess the attacker had the choice of putting the code into git, but chose to modify the tarball only, because he thought it would be less detected that way. A commit in git would certainly be more visible.

A backdoor in xz

Posted Mar 29, 2024 18:00 UTC (Fri) by bluca (subscriber, #118303) [Link] (23 responses)

> After observing a few odd symptoms around liblzma (part of the xz package) on Debian sid installations over the last weeks (logins with ssh taking a lot of CPU, valgrind errors) I figured out the answer

We were pretty much on the brink of disaster, and got saved because someone's login got slowed down enough that they went "mmh hang on a sec". It seems to me we just got very, very lucky here. Will we be so lucky the next time this happens too?

A backdoor in xz

Posted Mar 29, 2024 18:08 UTC (Fri) by AdamW (subscriber, #48457) [Link] (6 responses)

Or were we so lucky the last time it happened?

A backdoor in xz

Posted Mar 29, 2024 18:16 UTC (Fri) by alex (subscriber, #1355) [Link] (5 responses)

It would be hubris to assume this is the first or only attempt to subvert an upstream so far.

A backdoor in xz

Posted Mar 30, 2024 4:26 UTC (Sat) by marcH (subscriber, #57642) [Link] (3 responses)

Also, don't forget to blame the messenger when there is one: https://lwn.net/Articles/853717/

A backdoor in xz

Posted Mar 30, 2024 6:07 UTC (Sat) by ssmith32 (subscriber, #72404) [Link] (2 responses)

There's being a messenger, and then there's wasting maintainer time trying to get a paper written about something everyone knows is a problem[1] by failing to get malicious code past them. Which, of course, just contributes to the overall problem of overworked maintainers that is the real root cause of the issue being covered above [2].

So, yeah excluding people from the community that are more interested in their own academic careers rather than genuinely helping - that's not "shooting the messenger".

[1]https://m.youtube.com/watch?v=fu8ZNRDQsi8&t=6771s
[2]https://www.mail-archive.com/xz-devel@tukaani.org/msg0057...

A backdoor in xz

Posted Mar 30, 2024 14:37 UTC (Sat) by marcH (subscriber, #57642) [Link] (1 responses)

There's nothing wrong with excluding the offenders. There's everything wrong about whining and talking so much about them while saying so little about the actual issue.

A backdoor in xz

Posted Mar 30, 2024 14:41 UTC (Sat) by marcH (subscriber, #57642) [Link]

... and yes, maintainers are exhausted but no, that's not the only explanation.

A backdoor in xz

Posted Mar 30, 2024 11:27 UTC (Sat) by jd (subscriber, #26381) [Link]

We know it isn't the first. A group of researchers tried to submit exploit-ridden code to the Linux kernel, for example.

There are also packages that have been modified to not work in certain countries, and one Python package got effectively yeeted by the primary maintainer over politics.

And it's reasonable to assume that we only know a small percentage of cases.

Closed source is unlikely to be better. It would seem to me that quite a number of exploits that get discovered seem to be very bizarre backdoors.

If we generalise to all malicious code, then the Sony Rootkit is probably the most notorious.

A backdoor in xz

Posted Mar 29, 2024 18:27 UTC (Fri) by andresfreund (subscriber, #69562) [Link] (8 responses)

> We were pretty much on the brink of disaster, and got saved because someone's login got slowed down enough that they went "mmh hang on a sec". It seems to me we just got very, very lucky here. Will we be so lucky the next time this happens too?

I didn't even notice it during logging in with ssh or such. I was doing some micro-benchmarking at the time and was looking to quiesce the system to reduce noise. Saw sshd processes were using a surprising amount of CPU, despite immediately failing because of wrong usernames etc. Profiled sshd. Which showed lots of cpu time in code with perf unable to attribute it to a symbol, with the dso showing as liblzma. Got suspicious. Then recalled that I had seen an odd valgrind complaint in my automated testing of postgres, a few weeks earlier, after some package updates were installed. Really required a lot of coincidences.

A backdoor in xz

Posted Mar 29, 2024 18:37 UTC (Fri) by bluca (subscriber, #118303) [Link]

Ooft. Well done spotting this and chasing it down!

A backdoor in xz

Posted Mar 29, 2024 18:50 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Thank you for your work!

I donated $1000 to Debian for your work. Let's all do something nice for Debian, please?

A backdoor in xz

Posted Mar 29, 2024 23:34 UTC (Fri) by job (guest, #670) [Link]

Real heroes don't wear capes.

Your curiosity saved us from something so much worse. I wish I could thank you better, eternal gratitude must suffice for now.

A backdoor in xz

Posted Mar 29, 2024 23:43 UTC (Fri) by mcatanzaro (subscriber, #93033) [Link]

Thank you so much for averting this before it turned into a security apocalypse. This could have been one of the most severe cybersecurity incidents in history, except you got lucky enough to notice something odd and then were curious enough to look closer.

A backdoor in xz

Posted Mar 30, 2024 2:19 UTC (Sat) by helsleym (guest, #92730) [Link]

Wow! Thank you for pursuing this. So many folks wouldn't notice or would put this on their stack of things to get to later.

A backdoor in xz

Posted Mar 31, 2024 3:58 UTC (Sun) by cozzyd (guest, #110972) [Link]

So thanks to random botnet ssh login attempts, this was discovered? Somehow strangely karmic.

A backdoor in xz

Posted Mar 31, 2024 10:22 UTC (Sun) by xgongiveittoya (guest, #165847) [Link]

You have done a huge service for the entire Linux community. Thank you!

A backdoor in xz

Posted Apr 11, 2024 17:35 UTC (Thu) by martijn (guest, #125289) [Link]

Legend! Remarkable observation.

A backdoor in xz

Posted Mar 29, 2024 19:10 UTC (Fri) by zwenna (guest, #64777) [Link] (6 responses)

> We were pretty much on the brink of disaster, and got saved because someone's login got slowed down enough that they went "mmh hang on a sec".

I do not think that we actually got saved, at least saying so is premature until the payload has been analyzed in more detail. Right now I assume most Debian developers' machines are compromised.

This might very well be a much worse disaster than the 2008 Debian OpenSSL debacle, but time will tell.

A backdoor in xz

Posted Mar 29, 2024 19:28 UTC (Fri) by bluca (subscriber, #118303) [Link] (1 responses)

This was caught before it got in any stable release of any distribution, it's only in development/testing. The only exception is SUSE Tumbleweed, because it's rolling.

Just a few weeks of delay and it would have been part of the new Fedora 40 release and the new Ubuntu LTS 24.04 release.

A backdoor in xz

Posted Mar 30, 2024 19:47 UTC (Sat) by tao (subscriber, #17563) [Link]

I'd wager that most Debian developers use development/testing for their development systems. So "only" getting into development/testing is still quite bad.

A backdoor in xz

Posted Mar 29, 2024 19:34 UTC (Fri) by branden (guest, #7029) [Link] (1 responses)

> This might very well be a much worse disaster than the 2008 Debian OpenSSL debacle, but time will tell.

I wonder if we'll see an encore performance from sneering OpenSSL developer Ben Laurie, derogating the diligence and competence of "vendors" (distributors) once again.

https://web.archive.org/web/20090425084641/https://www.li...

Think how much better the xz episode could be turning out if we'd simply trusted upstream--the real experts.

A backdoor in xz

Posted Mar 30, 2024 14:16 UTC (Sat) by epa (subscriber, #39769) [Link]

You can still turn it into a “vendors suck” narrative since the dependency on xz is not there in OpenSSH, nor in the Portable version, but was patched in by Linux distributions.

Of course there is a reason for that — the upstreams don’t want systemd integration as it doesn’t fit their vision for the project. In some friendlier alternative universe upstream could have said “okay we can add the feature, but linking in the whole libsystemd is too much extra stuff to audit: any chance you can provide a slimmer library?”

A backdoor in xz

Posted Mar 29, 2024 20:50 UTC (Fri) by simon.d (guest, #168021) [Link]

This is exactly why I use a dm-verity to verify my rootfs (built with verity-squash-root). I can get compromised temporarily while online, but I only rebuilt and sign my image while offline on a fresh reboot. Ok, it would have saved me here, but probably not on a different compromise of a package, when already built into my system. Also secrets decrypted while compromised would also be compromised, but at least I can revert back to a secure system.

A backdoor in xz

Posted Mar 30, 2024 1:07 UTC (Sat) by DimeCadmium (subscriber, #157243) [Link]

I doubt "most Debian developers' machines" have publicly accessible SSH. Given that outbound/unprompted network traffic would be easily detectable, I think the assumption going around that you need to be able to access the system remotely to trigger the backdoor is correct.

A backdoor in xz

Posted Mar 29, 2024 19:26 UTC (Fri) by zblaxell (subscriber, #26385) [Link] (15 responses)

The "script to detect if it's likely that the ssh binary on a system is vulnerable" is some disturbing shell code.

It keeps calling eval on unique (and therefore unset?) variable names. It's moving strings between variable names in the middle, making it harder to read. It's making a whole lot of changes to src/liblzma/Makefile which it doesn't seem to ever use, some of which introduce code that ends in "| $(SHELL)".

If someone showed me the git commits and the detection script and said "one of these contains a trojan, you have ten minutes to decide which"...I'd pick the detection script. Is there a less obfuscated version?

A backdoor in xz

Posted Mar 29, 2024 19:33 UTC (Fri) by rwmj (subscriber, #5474) [Link]

Are you looking at the right script? You seem to be describing the injection script (written by the attacker). The detection script is just a fancy hexdump | grep.

A backdoor in xz

Posted Mar 29, 2024 19:34 UTC (Fri) by willmo (subscriber, #82093) [Link]

I don't think LWN processed the attachments correctly. See https://www.openwall.com/lists/oss-security/2024/03/29/4

A backdoor in xz

Posted Mar 29, 2024 19:39 UTC (Fri) by daroc (editor, #160859) [Link] (11 responses)

Unfortunately, the way that the site code handles attachments on mailing list items is a bit flawed, and currently only displays some of the attachments. It's on my list now to look into. I think you were looking at the backdoored m4 script, which is the first attachment on the message. The actual detection script looks like this:

#! /bin/bash

set -eu

# find path to liblzma used by sshd
path="$(ldd $(which sshd) | grep liblzma | grep -o '/[^ ]*')"

# does it even exist?
if [ "$path" == "" ]
then
	echo probably not vulnerable
	exit
fi

# check for function signature
if hexdump -ve '1/1 "%.2x"' "$path" | grep -q f30f1efa554889f54c89ce5389fb81e7000000804883ec28488954241848894c2410
then
	echo probably vulnerable
else
	echo probably not vulnerable
fi

Which I think you'll agree is much more readable.

A backdoor in xz

Posted Mar 29, 2024 21:45 UTC (Fri) by zblaxell (subscriber, #26385) [Link] (2 responses)

Wow...so today I get to add to my collection of "mail filtering/transformation worst-case outcomes" examples! In the LWN version, the text reads:

== Detecting if installation is vulnerable ==

Vegard Nossum wrote a script to detect if it's likely that the ssh binary on a
system is vulnerable, attached here. Thanks!


Greetings,

Andres Freund
P="-fPIC -DPIC -fno-lto -ffunction-sections -fdata-sections"
C="pic_flag=\" $P\""
O="^pic_flag=\" -fPIC -DPIC\"$"
R="is_arch_extension_supported"
[...]

so I'm thinking "OK, so it's attached here", not "this mailing list archive software dumps random text from a bunch of extremely heterogenous MIME parts into the message body without any markup indicating boundaries between sections."

Given the sensitivity, can that be fixed on the LWN archive before someone else makes the same mistake?

Attachment rendering

Posted Mar 29, 2024 22:09 UTC (Fri) by corbet (editor, #1) [Link] (1 responses)

Is it better now? That code has always sort of punted on attachments, I've tried to make it just a little bit less hackish.

Attachment rendering

Posted Mar 29, 2024 23:01 UTC (Fri) by zblaxell (subscriber, #26385) [Link]

Much better! The filenames really make all the difference in this case. Even without them, there are now two visible scripts on the page, and it's not hard to see which is the evil one.

Thanks!

A backdoor in xz

Posted Mar 30, 2024 9:00 UTC (Sat) by geuder (subscriber, #62854) [Link] (3 responses)

The script runs ldd on an untrusted binary. That itself can lead to code execution if the binary is suitably manipulated.

A backdoor in xz

Posted Mar 30, 2024 9:07 UTC (Sat) by geuder (subscriber, #62854) [Link] (2 responses)

Oops, I might have commented too fast.

sshd is not compromised (at least not by the issue discussed here...), a library it loads is.

I am not sure whether an attack (executing arbitrary code) using ldd needs to start in the main executable or whether it would also work in a shared library used by that executable.

A backdoor in xz

Posted Mar 30, 2024 14:04 UTC (Sat) by smurf (subscriber, #17840) [Link] (1 responses)

ldd opens each referenced library, but only to read the ELF header (to find recursive dependencies, I assume).

Thus, unless the build process manages to create a library with a corrupted ELF header that exploits a bug in ldd (which this one doesn't seem to do) the detection script is safe to run.

A backdoor in xz

Posted Mar 30, 2024 14:49 UTC (Sat) by geuder (subscriber, #62854) [Link]

Yes. The risk I remembered is described in https://catonmat.net/ldd-arbitrary-code-execution.

It involves using a different loader, but the loader is specified by the main executable. Shared libraries cannot bring in their own one (AFAIK...).

A backdoor in xz

Posted Mar 30, 2024 14:56 UTC (Sat) by fghorow (subscriber, #5229) [Link] (2 responses)

I just ran into a case where hexdump was not available on the machine being tested. The script complained about that, but printed the "probably not vulnerable" result anyway. Rather than trying to correct the script and open up another can of worms, please just use common sense when interpreting the output of this script.

A backdoor in xz

Posted Mar 31, 2024 16:01 UTC (Sun) by vegard (subscriber, #52330) [Link] (1 responses)

Yes, sorry -- it was hacked up in a couple of hours in anticipation of the report going live. The script was tested by 3-4 people in private before it got posted, but it obviously had some flaws. It was also meant for advanced users, in a way (think organizations or system administrators who can adapt it to their systems, not necessarily end users). I felt it was better to keep the script short and readable as opposed to trying to adapt it to every possible configuration, as that would have made it harder trust (as in: here's yet another shell script doing who-knows-what...).

A backdoor in xz

Posted Mar 31, 2024 16:05 UTC (Sun) by fghorow (subscriber, #5229) [Link]

My comment was made as a "heads up" and it was not intended as criticism of your script.

You absolutely made the right call in keeping it simple, IMHO. Thank you.

A backdoor in xz

Posted Mar 30, 2024 18:20 UTC (Sat) by kreijack (guest, #43513) [Link]

> # find path to liblzma used by sshd
> path="$(ldd $(which sshd) | grep liblzma | grep -o '/[^ ]*')"

> # does it even exist?
> if [ "$path" == "" ]
> then
> echo probably not vulnerable
> exit
> fi
[...]

$(which sshd) returns "" IF not run as root...

In this case the message should be "Cannot find 'sshd'" and not be a "probably not vulnerable"

A backdoor in xz

Posted Mar 29, 2024 19:39 UTC (Fri) by kleptog (subscriber, #1183) [Link]

What you're looking at there *is* the injected shell code. I think you see it because as an artifact of the LWN archive.

If you look at the Openwall archive (https://www.openwall.com/lists/oss-security/2024/03/29/4) you'll see the first attachment is the injected code you see above, the second is a binary attachment and the actual detection script is the third, but doesn't appear in the LWN copy of the email.

A backdoor in xz

Posted Mar 29, 2024 19:42 UTC (Fri) by bkw1a (subscriber, #4101) [Link] (126 responses)

The post from Andres Freund says:
"openssh does not directly use liblzma. However debian and several other distributions patch openssh to support systemd notification, and libsystemd does depend on lzma."

It seems to me that distributions shouldn't be modifying a critical piece of security infrastructure like sshd. Isn't that just asking for trouble?

A backdoor in xz

Posted Mar 29, 2024 20:01 UTC (Fri) by cjwatson (subscriber, #7322) [Link] (80 responses)

It's pretty difficult to get systemd integration right without patching sshd. Upstream are BSD folks and very firmly Don't Care About Systemd; there's no hope of getting that patch upstream.

That said, it's rather tempting to amend that patch to talk to NOTIFY_SOCKET directly rather than by linking against libsystemd, just to reduce exposure to gadgets like this.

A backdoor in xz

Posted Mar 29, 2024 20:10 UTC (Fri) by cjwatson (subscriber, #7322) [Link] (18 responses)

The only problem is that it's a pretty horrible amount of code to have to inline. Look at pid_notify_with_fds_internal ...

Apparently unreleased versions of systemd dlopen liblzma instead, which would have meant it wasn't in sshd's process space.

A backdoor in xz

Posted Mar 29, 2024 20:28 UTC (Fri) by intelfx (subscriber, #130118) [Link] (3 responses)

> The only problem is that it's a pretty horrible amount of code to have to inline. Look at pid_notify_with_fds_internal

I don't think any of that code is needed. OpenSSH as patched only needs sd_listen_fds() and plain sd_notify() which _as used_ can be implemented in about 5-10 lines of C code each.

A backdoor in xz

Posted Mar 29, 2024 20:35 UTC (Fri) by cjwatson (subscriber, #7322) [Link] (2 responses)

sd_notify calls pid_notify_with_fds_internal, though? But if there's a reasonably standard inlined C reimplementation that covers all the necessary API surface, I'd definitely consider it.

A backdoor in xz

Posted Mar 30, 2024 1:12 UTC (Sat) by zdzichu (subscriber, #17118) [Link]

That's the details of a specific implementation. Actual protocol is simple. Env var contains a socket path, you write a short string text string to it. Really few lines of code.

A backdoor in xz

Posted Mar 30, 2024 6:50 UTC (Sat) by intelfx (subscriber, #130118) [Link]

> But if there's a reasonably standard inlined C reimplementation that covers all the necessary API surface, I'd definitely consider it.

Yep, that's why I tried to emphasize "as used". The implementation you see is shared between several mostly-disjoint users (e. g. it is also used to communicate with hypervisors via vsock) and also implements other features of this ad-hoc protocol (such as fd passing) which are not used in openssh.

The usage in openssh (to signal readiness) is covered by writing a fixed, static text string into an AF_UNIX datagram socket pointed to by the $NOTIFY_SOCKET variable.

A backdoor in xz

Posted Mar 29, 2024 21:05 UTC (Fri) by judas_iscariote (guest, #47386) [Link] (13 responses)

It will still be.. selinux requires it. happy now ? Supply chain attacks are not systemd's fault .. :-)
It is more like corporations fault for not paying people to work in things they profit from.

A backdoor in xz

Posted Mar 30, 2024 11:04 UTC (Sat) by fenncruz (subscriber, #81417) [Link] (12 responses)

I agree it's not systemd's fault, but is there something it (and other software) can do to make this attack harder? Like somehow preventing the symbols being replaced by a malicious library?

A backdoor in xz

Posted Mar 30, 2024 12:02 UTC (Sat) by bluca (subscriber, #118303) [Link] (7 responses)

We have already replaced all linked dependencies (apart from glibc and libcap) in libsystemd.so with dlopen (that is activated only if and when the specific API that needs the external library is called, not automatically) in git main

A backdoor in xz

Posted Mar 30, 2024 14:12 UTC (Sat) by smurf (subscriber, #17840) [Link]

Interesting. I was about to thank you for your proactive response to this incident, but a look at systemd's git reveals that this change was done a month ago, in order to reduce systemd's footprint on startup RAM disks. ;-)

A backdoor in xz

Posted Mar 30, 2024 15:27 UTC (Sat) by dskoll (subscriber, #1630) [Link] (1 responses)

I understand the advantages of the dlopen approach, but it still leaves me feeling uneasy. You might get shared libraries that you don't expect dlopened just by making an innocent API call.

It seems to me that the supervisor notification protocol is likely to be used by many programs, and also quite likely that they might not want anything else from libsystemd. Wouldn't it make sense to put the notification client code in its own shared library that has no external dependencies and won't dlopen anything else ever?

A backdoor in xz

Posted Mar 30, 2024 15:52 UTC (Sat) by zdzichu (subscriber, #17118) [Link]

Funny, it was this way until v209 in 2014. sd-daemon was a collection of functions like sd_notify() and so on, it got merged into libsystemd then.

A backdoor in xz

Posted Mar 30, 2024 18:36 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Sorry, but random dlopen()s are even MORE unacceptable. It also prevents very useful security measures like locking the text of the running executable.

A backdoor in xz

Posted Mar 30, 2024 19:14 UTC (Sat) by andresfreund (subscriber, #69562) [Link] (2 responses)

It doesn't prevent that at all? Unless you use text relocations, .text should only be mapped read only. And .got would have been remapped ro at start if you use -z now -z relro. Dlopen() doesn't change any of that?

A backdoor in xz

Posted Mar 30, 2024 19:41 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

I mean, locking down the complete set of executable pages in a process, so that no new code can't get loaded. OpenBSD has mseal() that can do that.

> Dlopen() doesn't change any of that?

Indeed it doesn't (right now), but expanding its usage will make it harder to enable something like mseal() later.

A backdoor in xz

Posted Mar 31, 2024 13:13 UTC (Sun) by bluca (subscriber, #118303) [Link]

You can still do that, but then you lose some features. That seems like a perfectly acceptable trade-off to me.

A backdoor in xz

Posted Mar 30, 2024 16:53 UTC (Sat) by judas_iscariote (guest, #47386) [Link] (3 responses)

Yes, you can prevent symbols from been replaced by something else by various compiler, linker flags and possibly enviroment variables.. it will still be a cat & mouse game because something that has root already has all the power.

A backdoor in xz

Posted Mar 30, 2024 19:05 UTC (Sat) by andresfreund (subscriber, #69562) [Link] (1 responses)

Afaict all the options for doing so were used in this case. The redirection happened just before the got was remapped read only.

I'm somewhat surprised that nobody called for glibc's rtld-audit infrastructure to be removed. That's really what made this attack possible despite relro. As far as I know, it's not used widely.

A backdoor in xz

Posted Mar 31, 2024 13:30 UTC (Sun) by nix (subscriber, #2304) [Link]

Perhaps it should be possible to set some sort of link-time tag to instruct ld.so to disable the LD_AUDIT infrastructure for particular binaries? Not sure that's doable for specific shared libraries, but at least this would let one mark critical system daemons as "hands-off" for this application, so their own libraries can't compromise them like this. It's enough like AT_SECURE or coredump/ptrace prevention that there should probably be one mechanism to turn all this stuff on at the same time... (For userspace stuff latrace and the things that it enables are actually quite useful, but I can't imagine ever running latrace on sshd, and if I did I'm debugging it anyway and would be at the very least foregrounding it and could presumably manually turn auditing back on. For that matter, latrace could be modified to do that to the programs it invokes, since it knows its own use of LD_AUDIT is non-malicious.)

A backdoor in xz

Posted Mar 31, 2024 6:37 UTC (Sun) by epa (subscriber, #39769) [Link]

I think if the symbol-replacing were not allowed, nor arbitrary code execution on *loading* the library, then the attack would be more difficult. The application does not call any functions from xz. An attacker would have to get a backdoor into the library and somehow persuade sshd to call it.

A backdoor in xz

Posted Mar 29, 2024 20:15 UTC (Fri) by bkw1a (subscriber, #4101) [Link] (4 responses)

Every time a patch pulls in a new dependency, it increases our attack surface. That needs to be weighed against the benefit of the patch. For something like sshd, it seems like the openssh developers, who have security as their primary focus, should be the ones we trust to make that decision.

A backdoor in xz

Posted Mar 29, 2024 20:23 UTC (Fri) by cjwatson (subscriber, #7322) [Link] (2 responses)

I mean, look, I defer to the openssh developers on a _lot_ of stuff, but they're not the ones trying to integrate with the rest of our distribution and that does sometimes force some different decisions. The best I can do is document all the deviations as clearly as possible.

A backdoor in xz

Posted Mar 29, 2024 22:02 UTC (Fri) by dilinger (subscriber, #2867) [Link] (1 responses)

Also, what *is* "critical security infrastructure"? Is firefox/chromium critical security infrastructure? Is glibc? libz? libsasl? libselinux? Systemd does a whole lot of critical things on my system; is that critical security infrastructure that we shouldn't be patching?

On a lot of desktops, sshd isn't even installed. Is it critical security infrastructure because it's installed on some servers you consider important? What about the other daemons installed on important servers, like nginx/apache (and often the whole lamp stack)?

If you actually look at attack vectors, you start realizing pretty quickly that A LOT of software could (or should) be considered critical security infrastructure, and it's pretty unrealistic to not have to patch all of those bits of software to work on Debian's many desktop/server environments and hardware architectures. That also assumes that we can trust upstreams to not backdoor their code, which, as this example shows us, we clearly cannot.

A backdoor in xz

Posted Apr 3, 2024 5:44 UTC (Wed) by Lennie (subscriber, #49641) [Link]

The funny part is: any software installed becomes critical security infrastructure if a FOSS developer develops the software on his primary laptop which holds the SSH-keys used for git commits singing and git push.

A backdoor in xz

Posted Mar 29, 2024 23:58 UTC (Fri) by mcatanzaro (subscriber, #93033) [Link]

Sounds good, but in this case I think that's just wrong. You really want systemd to accurately know whether sshd is running or not. If systemd doesn't know, then you don't know, and that's a security disaster.

A backdoor in xz

Posted Mar 29, 2024 23:38 UTC (Fri) by cjwatson (subscriber, #7322) [Link]

I guess I need to amend this since https://bugzilla.mindrot.org/show_bug.cgi?id=2641#c13 happened. If something like that gets in then we'll definitely adopt it in Debian.

A backdoor in xz

Posted Mar 30, 2024 1:11 UTC (Sat) by DimeCadmium (subscriber, #157243) [Link] (48 responses)

Why does my ssh server need to integrate with my service manager?

A backdoor in xz

Posted Mar 30, 2024 1:40 UTC (Sat) by bluca (subscriber, #118303) [Link] (47 responses)

Because the service manager needs to know when the ssh server is ready

A backdoor in xz

Posted Mar 30, 2024 5:30 UTC (Sat) by wtarreau (subscriber, #51152) [Link] (11 responses)

Why ? Your response sounds more like "wants to know".

A backdoor in xz

Posted Mar 30, 2024 5:48 UTC (Sat) by rra (subscriber, #99804) [Link] (10 responses)

So that the system administrator who just restarted the ssh server knows it didn't actually start and doesn't log out before fixing it.
So that other services that depend on the ssh server being started know when to start.
So that when you ask the service manager what services failed, you'll know that the ssh server failed.
So that you have an actual service manager, not a bunch of YOLO shell scripts with no error handling.

A backdoor in xz

Posted Mar 30, 2024 8:12 UTC (Sat) by DimeCadmium (subscriber, #157243) [Link] (9 responses)

If it didn't actually start, then it shouldn't have forked off and backgrounded itself, which is how services notified the service manager that they had successfully started for literal decades before systemd came along.

A backdoor in xz

Posted Mar 30, 2024 8:23 UTC (Sat) by mb (subscriber, #50428) [Link] (4 responses)

Yep, and it was broken all the time because everybody did it differently and slightly wrong, until systemd came along.
But let's not distract from the discussion: systemd ist *not* why this backdoor was possible. It could have been any other library. It could even have been any other server application. It's not restricted to sshd.

The real problem is that patches that have not been understood/reviewed have been applied.
This is a social problem. Not a technical one.

A backdoor in xz

Posted Mar 30, 2024 12:46 UTC (Sat) by stef70 (guest, #14813) [Link] (1 responses)

Indeed. We need to wait until the full analysis of the backdoor to be sure that no tool other than sshd was targeted.

On my Debian system, liblzma.so is linked in several programs and libraries. A lot are unrelated to systemd: grub, insmod, lvm, reboot, gimp, imagemagick, runlevel, ...

All of them are potential targets for that xz backdoor. For now, we have to wait for the full analysis. I am pretty optimistic that sshd was the main target because installing another backdoor on the system or calling "home" would significantly increase the probability or detection.

A backdoor in xz

Posted Mar 30, 2024 23:33 UTC (Sat) by brooksmoses (guest, #88422) [Link]

There is code in the exploit that would look for additional files in the "test" directory that matched specific byte patterns, and then extract a payload from them and execute it. There are currently no files matching those patterns -- so it certainly looks like this bit was designed as a capability to target additional programs simply by adding additional "test" files to the git repository.

[Reference: https://github.com/Midar/xz-backdoor-documentation/wiki#s... as of the time of this comment.]

A backdoor in xz

Posted Mar 31, 2024 1:25 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link] (1 responses)

> Yep, and it was broken all the time because everybody did it differently and slightly wrong, until systemd came along.

Ah, okay. And how exactly do you believe that one methods of notifications is any more reliable at this than any other? They all rely on the software developer picking a good time to say "started".

> But let's not distract from the discussion: systemd ist *not* why this backdoor was possible

It absolutely is.

> It could have been any other library

But it wasn't. "Don't worry about our vulnerabilities, other people have vulnerabilities too!" "Don't worry about our bad design, other people have bad design too!"

A backdoor in xz

Posted Mar 31, 2024 9:22 UTC (Sun) by smurf (subscriber, #17840) [Link]

> They all rely on the software developer picking a good time to say "started".

They all rely on picking a good time that happens to *work*.

There are plenty of situations where, once you're *really* started, it's no longer possible to signal "OK I'm alive now" by double-forking.

Writing a PID file has its own class of race conditions, the handling of which I can guarantee most users of that method get fatally wrong.

And so on.

> "Don't worry about our vulnerabilities, other people have vulnerabilities too!" "Don't worry about our bad design, other people have bad design too!"

Don't blame the messenger. If linking to a library you don't strictly need *in your particular situation* is a "vulnerability" or "bad design" I can guarantee that 90+% of programs out there suffer from it.

A backdoor in xz

Posted Mar 30, 2024 9:53 UTC (Sat) by motk (subscriber, #51120) [Link]

Yeah, and it sucked. It sucked 35 years ago. It still sucks.

This whole thing has nothing to do with service management, and everything to do with large corporations relying on volunteers writing critical software apparently just for something to do.

A backdoor in xz

Posted Mar 30, 2024 16:58 UTC (Sat) by rra (subscriber, #99804) [Link] (2 responses)

> If it didn't actually start, then it shouldn't have forked off and backgrounded itself, which is how services notified the service manager that they had successfully started for literal decades before systemd came along.

I have run UNIX systems throughout those literal decades that you are talking about, and your faith in this half-assed, failure-prone mechanism is badly misplaced. I cannot count the number of ways I have seen this fail: the process does not actually start listening to the network until after the fork, the process starts listening before the fork but isn't really ready to accept connections because there is setup that has to be done after the fork, the process forks but doesn't fork twice and thus isn't properly reparented, the process didn't write a PID file and now you have no idea which process is actually running the service, the process did write a PID file and wrote the wrong PID to that file, you end up with multiple backgrounded copies of the same service running and interfering weirdly with each other... the list goes on.

We figured out that this was a bad way to run services by at least the early 2000s, when support for a foreground model with none of this self-daemonization nonsense badly copied into every service became widely available (and as someone who was managing UNIX systems all through that period, that was a delightful revelation). But you do not want to assume that the service is ready simply because the process has started. You need some mechanism for signaling that the service really has fully started, has allocated all of its resources, and is listening to network connections (if that is its job). Otherwise, you risk starting services that depend on it too soon.

Even upstart (the alternative preferred by some of the folks who disliked systemd) had a mechanism for doing this. (It was worse than systemd's, at least in my opinion.)

A backdoor in xz

Posted Mar 31, 2024 1:25 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link] (1 responses)

So have I, and YOUR faith in systemd's half-assed, failure-prone mechanism is badly misplaced.

Please stop here

Posted Mar 31, 2024 1:45 UTC (Sun) by corbet (editor, #1) [Link]

We have managed to keep this conversation relatively free of systemd bashing, which is really not relevant to the discussion. Please don't do any more of it here.

A backdoor in xz

Posted Mar 30, 2024 7:09 UTC (Sat) by epa (subscriber, #39769) [Link] (34 responses)

Fair enough. But why does that functionality need to pull in xz support? The ssh daemon does not itself do xz compression in order to integrate with systemd.

If the answer is “because it links as a C library and you get the transitive dependencies of everything”, that’s something to improve.

A backdoor in xz

Posted Mar 30, 2024 7:32 UTC (Sat) by mb (subscriber, #50428) [Link] (1 responses)

>because it links as a C library and you get the transitive dependencies of everything

So, statically link with LTO?

A backdoor in xz

Posted Mar 31, 2024 14:23 UTC (Sun) by dskoll (subscriber, #1630) [Link]

No, static linking isn't needed. Just split the large libsystemd into smaller libraries where each smaller library contains a set of closely-related APIs and minimal other dependencies. There's no reason to pull code in to do log compression if all you need is code for the sd_notify protocol.

A backdoor in xz

Posted Mar 30, 2024 7:50 UTC (Sat) by cjwatson (subscriber, #7322) [Link] (31 responses)

Indeed, and apparently unreleased versions of systemd already trim down the linkage of libsystemd so that liblzma won't be in the process space unless it's actually needed.

A backdoor in xz

Posted Mar 30, 2024 10:24 UTC (Sat) by job (guest, #670) [Link] (30 responses)

Doesn't that obscure what is happening, which risks making a not good situation even worse? The situation with a backdoored library would still be there, just harder to diagnose.

A backdoor in xz

Posted Mar 30, 2024 12:07 UTC (Sat) by bluca (subscriber, #118303) [Link] (25 responses)

The impact is reduced, because dlopen only happens if and when the API using the library is called by the program linking to libsystemd, rather than by default. So in this case it would not have happened, because sshd does not read compressed journal files, which is the reason compressed libs are linked in libsystemd.

Dependency chain of a full-feature build of libsystemd from main (plus a PR under review):

build/libsystemd.so.0 (interpreter => None)
libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
ld-linux-x86-64.so.2 => /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

We want to remove the need for libcap too, but that's a bit more complex.

A backdoor in xz

Posted Mar 30, 2024 18:23 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (24 responses)

This is an EXTREMELY bad move from systemd. A dlopen() is a much more worrying signal of exploitation, because it's so unused. And libsystemd will make it normal.

It also won't close off all avenues of attack. A malicious library can patch the code, ptrace() its process, modify the environment, etc.

A backdoor in xz

Posted Mar 30, 2024 19:12 UTC (Sat) by andresfreund (subscriber, #69562) [Link] (16 responses)

There already are dlopens in things like sshd, via e.g. PAM.

A backdoor in xz

Posted Mar 30, 2024 19:36 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (14 responses)

Yeah, and I also forgot about the horror of nsswitch.

Still, we should start cutting back on this kind of nonsense.

A backdoor in xz

Posted Mar 31, 2024 13:46 UTC (Sun) by nix (subscriber, #2304) [Link] (13 responses)

So to you, dlopen is a signal of exploitation and should be avoided because it's so rare, until it is pointed out that it's not rare and is already used in a wide variety of processes, whereupon you switch to calling unclear things 'this kind of nonsense', cite nsswitch (which is not relevant, given that PAM is at issue here), and suggest, what? Removing PAM and nsswitch?

That's going to work really well given how many sites use both to fold in new hostname lookup mechanisms, new user lookup mechanisms, and new and fairly complex authentication patterns on the fly.

Anyway, dropping nsswitch and PAM wouldn't even really help, despite being immensely disruptive. dlopen does have its problems[1] and it is reasonable to prefer to avoid it when possible, but it is not rare even in the absence of nsswitch and PAM. Try adding reporting to glibc to see how often it's invoked on real running systems. (It's a *lot*. Even syslog daemons make extensive use of it these days, so you can't even say "perhaps daemons running as root can't dlopen".)

You cannot use 'this uses dlopen' as a signal of suspiciousness, or of anything really, any more than 'this is dynamically linked' is such a signal. "This has IFUNC resolvers that redirect symbols in other libraries" is definitely an actual sign of badness that I've never heard of anything legitimate doing, and I'm wondering if glibc could detect and block that somehow without too much cost (it would at least involve stack frame walks, but the resolver has to mess with the stack frame anyway...)

[1] now that prelink is dead, mostly that you can't use ldd to statically determine what the shared library dep tree is and what things might be potentially impacted by ABI changes, which *is* actually problematic on real systems

A backdoor in xz

Posted Mar 31, 2024 16:12 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (12 responses)

> So to you, dlopen is a signal of exploitation and should be avoided because it's so rare, until it is pointed out that it's not rare and is already used in a wide variety of processes, whereupon you switch to calling unclear things 'this kind of nonsense', cite nsswitch (which is not relevant, given that PAM is at issue here), and suggest, what? Removing PAM and nsswitch?

Yeah, exactly. Remove dlopen() calls by refactoring the relevant systems. For example, musl libc does not have nsswitch (and has a built-in NSCD). PAM is already optional.

A backdoor in xz

Posted Mar 31, 2024 17:02 UTC (Sun) by nix (subscriber, #2304) [Link] (11 responses)

So... that this wouldn't actually help solve this problem is not important to you, then? (You clipped that out of my original reply without comment.)

That's a sign of someone on a hobby-horse if I ever heard of one.

(As someone who needs PAM to even log on -- on account of wanting to use YubiKey OTP to do so -- and who uses nsswitch for a variety of homebrewed lookups, I would obviously not be willing to drop either.)

A backdoor in xz

Posted Apr 1, 2024 12:19 UTC (Mon) by foom (subscriber, #14868) [Link] (10 responses)

Nsswitch has an obvious replacement for dlopen: sockets. They're already used in many interesting scenarios, e.g. host lookup is via DNS to localhost, user database often comes from libnss_ldapd or sssd — both of which simply implement a private socket protocol in their nsswitch library to talk to their corresponding service on localhost.

Then of course there's nscd, as already mentioned: a socket protocol for nsswitch lookups already implemented by glibc and musl. Someone could implement a different nscd server-side which doesn't use dlopen — without even modifying glibc. Yet, as far as I know, nobody actually has done so.

On the PAM side there's no similarly easy replacement, though one could investigate OpenBSD's BSD Auth system, which is extensible via spawning subprocesses to handle auth tasks.

In any case, that nobody seems to actually be working on any of this probably shows just how unimportant avoiding dlopen is for most people...

A backdoor in xz

Posted Apr 1, 2024 16:48 UTC (Mon) by nix (subscriber, #2304) [Link] (9 responses)

Avoiding the need to dlopen in statically linked binaries, while not losing nsswitch for such binaries, actually *does* matter to upstream (it would simplify ld.so a whole hell of a lot). So switching to a socket-based protocol is definitely on the cards.

The problem, as ever, is doing that compatibly -- but I suppose if glibc itself provided the 'nss server' that loaded existing nss modules and did everything else nsswitch did, and glibc called into it using the sort of thing you describe, this sort of thing might be practical: it would probably make nscd less of a horror show, too. With a lot of work (how many nss modules depend on being in the same address space as the running process, for starters? I bet it's not zero. And I bet this would slaughter performance for simpler cases, so maybe nss_files still needs to be built in. And so forth...)

A backdoor in xz

Posted Apr 1, 2024 18:04 UTC (Mon) by foom (subscriber, #14868) [Link] (1 responses)

An "nss server" is literally what nscd _already is_!

If you run the nscd service, then glibc sends nss lookups to nscd over a socket, instead of running them inside other binaries.

Nscd comes with a caching layer (unsurprisingly given its name), but you can mostly disable that if you only want the nss-server functionality.

A backdoor in xz

Posted Apr 2, 2024 17:10 UTC (Tue) by nix (subscriber, #2304) [Link]

Oh, of course it is. I am clearly missing the obvious right now :(

A backdoor in xz

Posted Apr 1, 2024 18:23 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> Avoiding the need to dlopen in statically linked binaries, while not losing nsswitch for such binaries, actually *does* matter to upstream (it would simplify ld.so a whole hell of a lot). So switching to a socket-based protocol is definitely on the cards.

glibc is the worst library in existence, so no wonder.

On the other hand, musl libc simply uses the nscd protocol to provide the NSS functionality and even allows wrapping legacy NSS modules: https://github.com/pikhq/musl-nscd

Additionally, with musl I can _already_ get a fully static system with zero dlopen()s or dynamic libraries. There are even several experimental distros that are fully statically linked. E.g.: https://framagit.org/Ypnose/solyste

A backdoor in xz

Posted Apr 2, 2024 17:12 UTC (Tue) by nix (subscriber, #2304) [Link] (5 responses)

> glibc is the worst library in existence, so no wonder.

At this point I'm wondering if you're just being intentionally unpleasant. glibc navigates a frankly horrifying pile of tradeoffs and does the job fairly well given that. If it was "the worst library in existence" it would not be *remotely* so widely used, nor work as well as it does.

A backdoor in xz

Posted Apr 2, 2024 22:54 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

I've been holding this opinion about glibc for decades now. I understand the difficulty of developing glibc, and it excuses at least some warts. But then we have musl which is so much nicer, while being standards-compliant.

I believe we should take at least _some_ of that experience and apply it to the rest of the system. Being static-friendly and not dlopen()-ing stuff is definitely a part of that.

BTW, does dlopen() in libsystemd preclude its static linking?

A backdoor in xz

Posted Apr 3, 2024 11:17 UTC (Wed) by nix (subscriber, #2304) [Link] (3 responses)

> BTW, does dlopen() in libsystemd preclude its static linking?

In the future in glibc, yes. In all other libcs I'm aware of, yes, even now. (Or, rather, you can *try* to call it in statically-linked binaries, but the call will always fail.)

This is of course one of many reasons why just statically linking everything is not the panacea some seem to think -- plugins really *are* a thing and sometimes loadable shared code in the same address space is a convenient way to implement them... there's not a chance you'll ever get KDE to work statically linked, for instance.

A backdoor in xz

Posted Apr 3, 2024 16:30 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> This is of course one of many reasons why just statically linking everything is not the panacea some seem to think -- plugins really *are* a thing

Plugins are a thing that has no business being in the foundational parts of the runtime. And it's not like we don't have a real-world example of a system without them, Alpine Linux exists. And it's significantly nicer to work with than the glibc-based systems.

A backdoor in xz

Posted Apr 4, 2024 12:49 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

> Plugins are a thing that has no business being in the foundational parts of the runtime.

I am not convinced, and since as usual you didn't bother to give any reasons, relying instead on pure assertion, I'm not sure why you think this not-an-argument would ever convince anyone who didn't already agree with you.

Why on earth would you consider name lookup or authentication, both things that have had numerous wildly divergent implementations over time and which obviously have different site-by-site requirements, hence the *existence* of pluggable systems to implement them, to be things that "have no business" existing, based on the pure assertion that they are "in the foundational parts of the runtime"? People are *using* nss and PAM's extensibility, you know. They're not just there to annoy you. This is not a moribund module system with a half-dozen stale modules that have hardly changed in the last twenty years. People are plugging other things into that pluggability. (Not that this attack even *relied* on that pluggability, or NSS, or PAM, so why you think ripping them out will help here is quite beyond me.)

For that matter, what on earth even is a "foundational part of the runtime"? Is it the toolchain? Surely that counts if anything does! Better rip out LTO from GCC and clang then, since both rely on linker plugins that run the entire compiler! (Also, how many linker plugins are there? I can hardly name any but LTO. That's gotta be moribund, rip it out!) Is it the kernel? Better rip out kernel modules then, in-tree or not, since if they're not dynamically loaded plugins, nothing is... is it glibc? Surely not, since you can replace it with any other libc you like and keep the kernel and most userspace the same after a recompile: it could hardly be foundational! So I guess NSS can stay. Not sure about PAM, the idea came from Solaris and you have in the past expressed a liking for that sort of thing so maybe that's good now too?

That's the problem with arguing by pure assertion: since you give no reasons, define none of your terms, and provide no grounds to agree with you, there's no reason to accept your premises: and even if I do, it's easy to argue in the exact opposite direction since the premises are so vague, which makes your argument nothing more than a statement of personal preferences and an assertion that of course *your* personal preferences are more important than anyone else's.

Is the real definition "something Cyberax asserts without argument or rationale is foundational"? Or just "Cyberax is right"?

A backdoor in xz

Posted Apr 4, 2024 17:24 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

I thought that the reasons for NOT doing plugins are obvious. They add a huge amount of complexity, preclude useful mitigations (mseal/mimmutable), and make the system harder to analyze. You can't statically determine the dependency closure anymore.

Plugins inherently face a complicated environment that they don't control and should not perturb too much. And a crashed plugin will take down the entire application. This was reasonable 30 years ago, but it's not anymore. These days, we actually have a good architectural pattern for this: split modules into a separate daemon that is activated by systemd as needed.

> People are *using* nss and PAM's extensibility, you know.

NSS is actually hardly used these days, NIS/NIS+ have mostly died out. The only major surviving service is LDAP (usually via SSSD). It can simply be incorporated into the glibc (it's 43kb), or it can be split into a daemon that talks to glibc via the NSCD protocol.

If we're talking about PAM in particular, then it's nothing but a stack of bad design decisions. In case of SSH, they can be replaced by ephemeral SSH certificates for most of the scenarios (e.g. a shared machine in a university or for management access to the production cluster on AWS EC2).

These two items will make most non-interactive systems completely dlopen()-free.

A backdoor in xz

Posted Apr 1, 2024 14:41 UTC (Mon) by job (guest, #670) [Link]

In retrospect, I think most people would agree that the design of PAM was a mistake. It was hugely controversial at its time and many fought against its inclusion in distributions.

In the end it was included because it made possible some use cases where no one else stepped up to make a practical alternative.

I don't think that is something we want to emulate. It is certainly possible to satisfy the necessary use cases without resorting to dlopen().

A backdoor in xz

Posted Mar 31, 2024 12:12 UTC (Sun) by bluca (subscriber, #118303) [Link] (6 responses)

The main reason this is done and will happen is to reduce mandatory dependencies. If the Linux ELF format supported optional dependencies in a better way, that are loaded only when needed, then there wouldn't be any need for manually doing dlopen(). I believe OSX's shared object format implements this. But we are where we are, and hence that's the only mechanism we got.

A backdoor in xz

Posted Mar 31, 2024 13:49 UTC (Sun) by nix (subscriber, #2304) [Link]

Hmm. That's interesting! This is kind of a DT_NEEDED which kicks in (and loads dependent libs, runs constructors etc) only when the first symbol in it is called, kind of like lazy binding but doing a lot more than just a symbol resolution?

That's tricky to implement (because doing things in the resolver is *always* a bit tricky) but I can't immediately think of any reason why it's *impossible*. It would need a new dynamic tag of course, DT_LAZY_NEEDED? DT_NEEDED_OPTIONAL?

You couldn't use the simpleminded approach above for everything (good luck making this work for things like data symbols where the GOT is needed before the PLT or in general anywhere you couldn't have used lazy binding before, or where you need the shared library's ELF constructors to run early, or where TLS inadequacies would prevent dlopen from working happily -- and it has the same security implications as using lazy binding) but it should work in a fairly large proportion of cases.

A backdoor in xz

Posted Mar 31, 2024 16:13 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> The main reason this is done and will happen is to reduce mandatory dependencies

No, it's not. It's done to _paper_ over dependencies, making them harder to discover statically and creating wonderful race conditions if mimmutable() is used at an inopportune moment. It's an all-around bad decision.

A backdoor in xz

Posted Mar 31, 2024 17:06 UTC (Sun) by nix (subscriber, #2304) [Link] (3 responses)

Since mimmutable() does not exist on Linux, making changes in Linux-only software like systemd to allow for it seems deeply bizarre, particularly when those changes *reduce* security (like, say, increasing the set of always-loaded libraries to include some which have just been seen to launch attacks when loaded, rather than loading as many as possible of them only as needed).

What next? Shall we make changes to allow for Windows's per-libc malloc(), or for Linux's not-at-all-planned upcoming transition to Mach-O?

A backdoor in xz

Posted Mar 31, 2024 18:54 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> Since mimmutable() does not exist on Linux

This is subject to change: https://lwn.net/Articles/958438/

> particularly when those changes *reduce* security

They don't. libsystemd will _still_ depend on xz, it just will be hidden from cursory analysis.

> What next? Shall we make changes to allow for Windows's per-libc malloc(),

That's actually a pretty good idea, that will make several classes of vulnerabilities more difficult to exploit.

> or for Linux's not-at-all-planned upcoming transition to Mach-O?

I'd take PE: https://blog.hiler.eu/win32-the-only-stable-abi/

A backdoor in xz

Posted Mar 31, 2024 19:36 UTC (Sun) by nix (subscriber, #2304) [Link] (1 responses)

> They don't. libsystemd will _still_ depend on xz, it just will be hidden from cursory analysis.

I honestly wonder if you're even reading this thread. This attack depended on liblzma being loaded into sshd's memory because it was loaded by virtue of DT_NEEDED: after this commit, it would not be loaded at all, because libsystemd would only have loaded it if compressed journal reading was attempted, which sshd never attempts.

So it *would* in fact solve the problem.

But I'm tired of arguing with a brick wall with prejudged opinions, I think. Good night.

A backdoor in xz

Posted Mar 31, 2024 20:16 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

> This attack

Reread your words. THIS attack. As in, this _particular_ one. Sure, having the library dlopen()-ed prevents it. I can think of several ways I can backdoor liblzma to work around it.

Making the system usable with mimmutable/mseal would prevent whole categories of exploits. And promoting the dlopen() craze will make this kind of mitigation impossible.

And yeah, I absolutely hate the braindead design of nsswitch, PAM, and now libsystemd.

A backdoor in xz

Posted Mar 30, 2024 17:04 UTC (Sat) by rra (subscriber, #99804) [Link] (3 responses)

This specific exploit I believe relied on being loaded into the process namespace early so that it could set up IFUNCs. I am very far from an expert in how this works, but if I'm understanding this correctly, it would be too late to do this during dlopen (if the library were even dlopened; the primary mitigation is that sshd would have never dlopened liblzma at all with this new systemd design).

A backdoor in xz

Posted Mar 30, 2024 17:18 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

IFUNCs are invoked upon dlopen() and can alas do the same sort of evil thing then that they do here (though they're not supposed to), but of course in this case libsystemd would never have done the dlopen() so the IFUNC would never have had a chance to execute.

IFUNCs are not really the villain here. It is perfectly possible for liblzma to have done the same sort of evil using only perfectly normal symbol interposition, dlsym(..., RTLD_NEXT) and ELF constructors.

A backdoor in xz

Posted Mar 30, 2024 19:08 UTC (Sat) by andresfreund (subscriber, #69562) [Link] (1 responses)

It would have been harder and noisier to do what the backdoor did during dlopen, even if it were called. By that time sshd's .got would have been read only, so redirection would have required remapping.

A backdoor in xz

Posted Mar 30, 2024 23:21 UTC (Sat) by nix (subscriber, #2304) [Link]

I dunno, some parts would have been easier -- after dlopen() you can at least trust that libc functions etc can be freely called, which is somewhat risky with IFUNCs in libraries loaded via DT_NEEDED. (But more likely they'd just have hunted something else down and attacked that instead. They only have to be lucky once...)

A backdoor in xz

Posted Mar 30, 2024 6:16 UTC (Sat) by mchehab (subscriber, #41156) [Link] (5 responses)

> It's pretty difficult to get systemd integration right without patching sshd. Upstream are BSD folks and very firmly Don't Care About Systemd; there's no hope of getting that patch upstream.

Why systemd would possible require any integration with sshd? Originally, it started as a replacement for initrd, meant to make system init faster. See https://0pointer.de/blog/projects/systemd.html:

> For a fast and efficient boot-up two things are crucial:
>
> - To start less.
> - And to start more in parallel.

In practice, system init is now a lot heavier and takes a lot more time to start a system than what it used to be with sysV init.

It also is now not only a PID 1 replacement, but it does lots of integration and interaction with almost everything needed for a system to run, including audit trails/logs.

With that, it became a component that can be compromised indirectly via changes on dozens (or hundreds?) of different components that are not directly related to systemd itself. That opened a window like what just happened where a malicious code introduced into xz is capable of compromising systems that contain systemd integration OOT patches.

IMO, systemd should return to its roots and stop requiring interactions with other packages unrelated to PID 1's task.

A backdoor in xz

Posted Mar 30, 2024 6:55 UTC (Sat) by intelfx (subscriber, #130118) [Link] (4 responses)

> Why systemd would possible require any integration with sshd?

To signal (and receive) the readiness state of the daemon in question. Not more, not less.

> IMO, systemd should return to its roots and stop requiring interactions with other packages unrelated to PID 1's task.

I'd say that "reliably determining whether the supervised process has successfully started up" (i. e. loaded and parsed its configuration, bound all the necessary sockets, did not encounter any other failures) is very much within the definition of the PID 1's task.

A backdoor in xz

Posted Mar 30, 2024 22:57 UTC (Sat) by mchehab (subscriber, #41156) [Link] (1 responses)

> > Why systemd would possible require any integration with sshd?

> To signal (and receive) the readiness state of the daemon in question. Not more, not less.

System V init never needed that, as there are simple generic solutions to monitor that. Basically, when a process is forked on a child process and such child dies, the parent is notified. This a well-defined POSIX-defined behavior.

> > IMO, systemd should return to its roots and stop requiring interactions with other packages unrelated to PID 1's task.
>
> I'd say that "reliably determining whether the supervised process has successfully started up" (i. e. loaded and parsed its configuration, bound all the necessary sockets, did not encounter any other failures) is very much within the definition of the PID 1's task.

It shall be up to sshd process - and to all other system daemons - to die if it failed to parse configuration and/or bind necessary sockets. The task of PID 1 is to monitor if the process is dying too fast, and, on such cases, to take some action.

There's absolutely no need to modify system daemons, implementing non-POSIX out-of-tree hacks just for PID 1 to be aware that a process is up and running.

A backdoor in xz

Posted Mar 31, 2024 2:07 UTC (Sun) by intelfx (subscriber, #130118) [Link]

> System V init never needed that

Yes, and it sucked.

> This a well-defined POSIX-defined behavior

The fact that it is well-defined or POSIX-defined does not automatically mean that it's _good_. I hate to break it to you, but POSIX is not a pinnacle of system design.

> It shall be up to sshd process - and to all other system daemons - to die if it failed to parse configuration and/or bind necessary sockets

Setting up a proper readiness notification by double-forking is approximately tenfold more complicated and requires exponentially more moving parts than the sd_notify mechanism.

In fact, many daemons (including openssh) do not complete their initialization until after the fork, so the only correct implementation of the interface you describe entails the immediate child _waiting_ for the grandchild to finish its setup, and only then exiting. Which means that there has to be a temporary pipe or socket between the child and the grandchild.

So now we are choosing between a socket notification mechanism implemented _once_ in a well-audited, well-maintained project (systemd) and **the same socket notification mechanism** plus a bunch of historical nonsense implemented _all over again_ in each daemon.

I trust the choice is obvious.

A backdoor in xz

Posted Apr 4, 2024 15:34 UTC (Thu) by koh (subscriber, #101482) [Link] (1 responses)

> To signal (and receive) the readiness state of the daemon in question. Not more, not less.

Why would liblzma be needed for that?

A backdoor in xz

Posted Apr 4, 2024 15:44 UTC (Thu) by cjwatson (subscriber, #7322) [Link]

It's not any more (following https://github.com/systemd/systemd/pull/31550 and https://salsa.debian.org/ssh-team/openssh/-/commit/cc5f37...).

A backdoor in xz

Posted Mar 29, 2024 20:03 UTC (Fri) by mussell (subscriber, #170320) [Link] (14 responses)

It's necessary because upstream doesn't really consider Linux a first-class citizen. OpenSSH is primarily developed for OpenBSD and the version that runs on the vast majority of systems is the OpenBSD version with a portability layer.

Genuinely surprised that's there's no project to replace OpenSSH in a memory safe language that is designed around the OS that everyone actually uses.

A backdoor in xz

Posted Mar 29, 2024 20:09 UTC (Fri) by diegor (subscriber, #1967) [Link] (12 responses)

> Genuinely surprised that's there's no project to replace OpenSSH in a memory safe language that is designed around the OS that everyone actually uses.

Windows? Not trolling, but just trying to make a point...

A backdoor in xz

Posted Mar 29, 2024 20:30 UTC (Fri) by intelfx (subscriber, #130118) [Link] (5 responses)

>> Genuinely surprised that's there's no project to replace OpenSSH in a memory safe language that is designed around the OS that everyone actually uses.
>
> Windows? Not trolling, but just trying to make a point...

Perhaps we could amend that to "<...> around the OS that everyone actually uses OpenSSH on"?

A backdoor in xz

Posted Mar 29, 2024 22:43 UTC (Fri) by magfr (subscriber, #16052) [Link] (4 responses)

Windows.
OpenSSH is part of Windows 10+
You can finally open up cmd and type ssh user@system and the right thing happens.

A backdoor in xz

Posted Mar 29, 2024 23:59 UTC (Fri) by skissane (subscriber, #38675) [Link]

Most Windows users never use OpenSSH.
Vast majority of Windows installs have the OpenSSH server disabled.

And a lot of Windows users who actually do use an SSH client aren't using the bundled OpenSSH client – they are using PuTTY, or Cygwin/MSYS2 OpenSSH, or WSL OpenSSH, or one of a dozen other open source and proprietary Windows SSH clients.

I really doubt use of Windows bundled OpenSSH is greater than OpenSSH use on Linux (which includes WSL)

A backdoor in xz

Posted Mar 30, 2024 3:51 UTC (Sat) by ibukanov (subscriber, #3942) [Link]

Like 3 years ago I lost a few hours after trying Windows-bundled ssh. During git clone or pull it sometimes stopped working.

A backdoor in xz

Posted Mar 30, 2024 7:04 UTC (Sat) by intelfx (subscriber, #130118) [Link]

> Windows.
> OpenSSH is part of Windows 10+
> You can finally open up cmd and type ssh user@system and the right thing happens.

That's not the openssh _daemon_. And it's not the OS everyone *uses* openssh on.

A backdoor in xz

Posted Mar 30, 2024 7:47 UTC (Sat) by jem (subscriber, #24231) [Link]

Last time I checked, OpenSSH on Windows was a joke. Microsoft is doing a half-hearted job with the Windows port. It typically lags a few versions behind, and they don't even bother to write their own documentation, but instead refer to the man pages of the upstream version. Microsoft uses the same version numbering, for example calling their version OpenSSH 8.2, even if they leave out features at will. For example, the last Windows version I checked completely lacked support for PKCS11 (smart cards). The -I option was not recognized.

Also, if you wanted to use ssh agent, you had to install the SSH server, because ssh agent was bundled with the server package, not the client package, showing a complete lack of understanding of what the role of ssh agent is.

A backdoor in xz

Posted Mar 30, 2024 8:14 UTC (Sat) by niner (subscriber, #26151) [Link] (5 responses)

There are a lot more Linux boxes than Windows boxes. It's just that a lot of them are virtual and there are more non-desktop than desktop ones.

A backdoor in xz

Posted Mar 30, 2024 9:39 UTC (Sat) by geuder (subscriber, #62854) [Link] (4 responses)

Really? I mean Linux boxes having a stable, public IPv4 address and exposing sshd. Not counting Android and other embedded stuff.

I have no statistics whatsoever at hands. On one side it sounds unbelievable that you need more servers than people to serve. On the other hand computing has become such a waste of resources that I wouldn't be too surprised if you were correct.

A backdoor in xz

Posted Mar 30, 2024 9:52 UTC (Sat) by niner (subscriber, #26151) [Link] (1 responses)

Why not count embedded stuff? From Wifi routers to TVs to security cameras to light bulbs, they are running Linux and compromising them can give you a foot hold in a network.
Then of course there are millions and millions of systems comprising the cloud.

A backdoor in xz

Posted Mar 30, 2024 12:55 UTC (Sat) by geuder (subscriber, #62854) [Link]

In general yes. But I thought here we were discussing the concrete attack to get a backdoor into sshd.

I don't think a lot of those systems listen to the internet using sshd.

Of course with the hundreds of commits by the maintainer account in question it's not impossible that sshd is only the first attack vector found and there are also others.

A backdoor in xz

Posted Mar 30, 2024 13:24 UTC (Sat) by pawel44 (guest, #162008) [Link] (1 responses)

You need Linux servers to scan the web for Windows viruses. Furthermore, if we count Android the answer is clear.

A backdoor in xz

Posted Mar 30, 2024 14:39 UTC (Sat) by smurf (subscriber, #17840) [Link]

On the other hand, stock Android doesn't run a ssh server.

A backdoor in xz

Posted Mar 30, 2024 14:37 UTC (Sat) by TRS-80 (guest, #1804) [Link]

https://github.com/mkj/sunset from the author of dropbear

A backdoor in xz

Posted Mar 29, 2024 20:10 UTC (Fri) by dullfire (subscriber, #111432) [Link]

> It seems to me that distributions shouldn't be modifying a critical piece of security infrastructure like sshd. Isn't that just asking for trouble?

yeah my reaction as well. sshd is one of the few has-to-be-priviledge processes with an exposed attack surface even on hardened systems.

A backdoor in xz

Posted Mar 29, 2024 21:02 UTC (Fri) by judas_iscariote (guest, #47386) [Link] (3 responses)

I want you to go back to your comment after trying to pass anything through openSSH upstream maintainers. They literally don't care about linux features or requests at all.

A backdoor in xz

Posted Mar 29, 2024 23:49 UTC (Fri) by cjwatson (subscriber, #7322) [Link] (2 responses)

To be fair, I have got Linux-specific patches into openssh-portable a number of times. They're just very conservative - not necessarily a bad thing!

A backdoor in xz

Posted Mar 31, 2024 10:48 UTC (Sun) by ssl (guest, #98177) [Link] (1 responses)

OpenSSH ≠ openssh-portable

A backdoor in xz

Posted Mar 31, 2024 11:11 UTC (Sun) by cjwatson (subscriber, #7322) [Link]

I'm aware, but since the immediate upstream for most distributions is openssh-portable and that's who we report bugs to, I thought it likely that the other people talking about OpenSSH in this thread actually meant openssh-portable.

A backdoor in xz

Posted Mar 29, 2024 21:08 UTC (Fri) by atnot (subscriber, #124910) [Link] (1 responses)

I think that's kind of missing the point.

XZ was clearly just means to an end here. They almost definitely ran "ldd" on the ssh binary on a debian system two years ago and scanned the list of libraries for whatever upstream was least maintained and would be easiest to compromise. If it wasn't XZ, it would have been the next thing on the list.

A backdoor in xz

Posted Mar 29, 2024 21:52 UTC (Fri) by sjj (subscriber, #2020) [Link]

I think you’re right. This was most likely attempt to get a longer term vulnerability into sshd/systemd/selinux/etc. It’s a multi pronged attack, including brigading the xz maintainer to add another maintainer.

A backdoor in xz

Posted Mar 30, 2024 1:37 UTC (Sat) by bluca (subscriber, #118303) [Link] (22 responses)

> It seems to me that distributions shouldn't be modifying a critical piece of security infrastructure like sshd. Isn't that just asking for trouble?

No, it isn't, you are missing the wood for the trees. It's normal for distributions to patch core components - pick any distro and look at their kernel, gcc or glibc packages and you'll find tons of patches. The issue here is that a combination of archaic release practices (make-dist tarball generated who-knows-where-by-whom) and very sophisticated attack almost caused a disaster, essentially because of a lack of supply chain security.

A backdoor in xz

Posted Mar 30, 2024 11:07 UTC (Sat) by MarcB (subscriber, #101804) [Link] (21 responses)

The question is, why OpenSSH is linking against libsystemd and not just implementing the needed subset of the notify API. The sd_notify interface is simple and stable.

But for Systemd in general, what is XZ actually needed for? If it is just for the journal, you might want to consider dropping this. xz was only part-time-maintained by a single person before the malicious co-maintainer joined. And LZMA does not add match (anything?) over ZSTD.

A backdoor in xz

Posted Mar 30, 2024 11:52 UTC (Sat) by chris_se (subscriber, #99706) [Link] (9 responses)

> And LZMA does not add match (anything?) over ZSTD.

I don't remember all the details, but ~ 8 months or so ago I experimented with replacing xz with zstd for fs images at $dayjob - and while zstd was a LOT faster, even at max zstd level the files were still ~ 20% larger than a plain xz -7. (xz -9 was too slow to be practical) Take the exact number with a grain of salt because it's from memory, but the difference was significant.

In our case the significantly higher compression ratio was worth it.

Can't speak about systemd, but I would definitely not make any absolute statements that there are NO advantages to xz/lzma.

A backdoor in xz

Posted Mar 30, 2024 19:25 UTC (Sat) by MarcB (subscriber, #101804) [Link] (8 responses)

Did you use the "ultra" settings for zstd or any of the advanced options? Maybe there are some binary data scenarios where LZMA still wins, but in all tests I did, zstd achieved better compression. But this always was mostly textual data.

A backdoor in xz

Posted Mar 30, 2024 21:19 UTC (Sat) by mbunkus (subscriber, #87248) [Link] (4 responses)

Really? I always think of zstd as worse but orders of magnitude faster than xz, and worse but in-different-universes-kind-of-faster than bzip9 (all at default settings).

After reading another comment in this thread about using zstd for systemd's journal, I did a short test with a 1.6 GB journal export file (journalctl -o export …). The results were roughly:

| Type         | Size | Time  |
|--------------|------|-------|
| uncompressed | 1.6G | —     |
| gzip         | 78M  | 6.9s  |
| bzip2        | 58M  | 1m34s |
| zstd         | 62M  | 0.9s  |
| zstd -9      | 51M  | 3.9s  |
| xz           | 43M  | 4.5s  |

With the exception of zstd -9 all other compressors used their default settings.

(As stated in journalctl's man page, the "export" format is mostly text with a small amount of binary data for structure)

I'd be interested in situations where zstd compresses better than xz. Do you have some concrete numbers?

A backdoor in xz

Posted Mar 31, 2024 0:29 UTC (Sun) by MarcB (subscriber, #101804) [Link] (3 responses)

At work, it was a large mail archive; essentially write-only - if we ever have to read it, something went wrong (legally speaking :-)

For linux-6.8.2.tar (1.4G), I get 137MiB for xc -9 and and 133MiB for zstd --ultra -22.

A backdoor in xz

Posted Mar 31, 2024 12:15 UTC (Sun) by mbunkus (subscriber, #87248) [Link] (2 responses)

Ooooh I hadn't been aware zstd has compression levels higher than 9. Good to know.

I did a couple more tests with this knowledge; here's the updated table:

| Type             | Size |     Time |
|------------------|-----:|---------:|
| uncompressed     | 1.6G |        — |
| gzip             |  78M |     6.9s |
| bzip2            |  58M |  1m34.1s |
| zstd             |  62M |     0.9s |
| zstd -9          |  51M |     3.9s |
| zstd -19         |  44M |  3m38.6s |
| zstd --ultra -22 |  41M | 21m43.1s |
| xz               |  43M |     4.5s |
| xz -9            |  43M |    15.6s |

So yes, you can get zstd down to below xz, at least with content that is mostly text, but now the duration completely flips upside down with xz looking good with 5s & zstd being so far out of this world that it isn't funny anymore.

Note, though, that xz is multi-threaded & zstd all the others don't seem to be: zstd only used a single core even on --ultra -22 whereas xz -9 used eight of my 32 cores. That being said, "zstd -19" uses 14.6 times the amount of time, "zstd --ultra -22" is at an unbelievable 83.5 times, making it still slower per core than "xz -9".

Does multi-core processing matter? Let's take build pipelines such as a build server for a distribution such as Debian as an example. If they want to achieve high utilization of their resources, they have to run stuff in parallel. This means that they can either assign a single core to each build VM & run a lot of build VMs in parallel, or they can assign multiple cores to each build VM & run fewer of them. In the latter case having a compression step that can only make use of a single core & that takes x times the time of another compressor with similar results in compression ratio, that yields rather low utilization.

Don't get me wrong; I really like zstd & the tradeoffs it makes. I use it as my default compressor in most day-to-day use cases for its impressive speed, especially interactively. But when file size is a concern (e.g. a lot of countries out there where internet traffic is mostly mobile & therefore both slow & expensive at the same time), xz pretty much always wins, no matter how you look at it. It's really no big surprise for me it has gained such wide-spread usage in the OSS world.

A backdoor in xz

Posted Mar 31, 2024 17:46 UTC (Sun) by andresfreund (subscriber, #69562) [Link] (1 responses)

> Note, though, that xz is multi-threaded & zstd all the others don't seem to be: zstd only used a single core even on --ultra -22 whereas xz -9 used eight of my 32 cores.

zstd -T0 will do the same.

A backdoor in xz

Posted Mar 31, 2024 18:12 UTC (Sun) by mbunkus (subscriber, #87248) [Link]

Indeed; I somehow totally missed that when glancing over zstd's options earlier today. Thanks for pointing that out.

Interestingly it does worse than "xz -T0" does wrt. how many cores it can effectively use. On my 32-core system with the same 1.6 GB input file "zstd --ultra -22 -T0" starts out using four cores but drops down to & stays at three cores after a handful of seconds. Therefore processing still takes 7m38s. Using a file or STDIN as input makes no difference. I guess zstd simply cannot segment the source as much as xz does.

Now "xz -T0" (which is the default in recent xz versions) also only uses eight cores on the same machine. Then again even with "-9" it is worlds faster both per core & in total than "zstd --ultra -22".

Then again, I'm really not trying to argue that xz is better than zstd, even though I probably sound like it. I just tried to answer the question why the OSS community has adopted xz as widely as it has, simply to satisfy my own curiosity. Also it's good to know the strengths & weaknesses of the various tools at our disposal.

A backdoor in xz

Posted Mar 31, 2024 6:32 UTC (Sun) by chris_se (subscriber, #99706) [Link] (2 responses)

Did you use the "ultra" settings for zstd or any of the advanced options?

I don't remember. I just redid the same checks again for a single file I had laying around (I did the previous checks against multiple variants), and I got this:

| Method           | Time (m:s) | RAM during compress | Size      |
|------------------|------------|---------------------|-----------|
| xz               | 2:12       |   95 MiB            | 135 MiB   |
| xz -7            | 2:22       |  187 MiB            | 134 MiB   |
| xz -9            | 2:34       |  675 MiB            | 115 MiB   |
| zstd             | 0:02       |   52 MiB            | 182 MiB   |
| zstd -19         | 2:49       |  250 MiB            | 147 MiB   |
| zstd --ultra -22 | 4:22       | 1328 MiB            | 121 MiB   |

(All done one a single CPU core, Intel Core i7-8700K. Debian 12 stable.)

My payload is basically a tar file of a minimized Debian 12 rootfs, plus some additional internal software -- nothing special. (Orig size: 554 MiB)

To summarize my test: even at --ultra -22 zstd is worse in all aspects compared to xz -9.

A backdoor in xz

Posted Mar 31, 2024 16:06 UTC (Sun) by stefanor (subscriber, #32895) [Link] (1 responses)

zstd makes a trade-off of compression to decompression resources.

The main promise of zstd over the other options is faster decompression, so I think it would only be fair to include that in the comparison.

A backdoor in xz

Posted Apr 1, 2024 19:33 UTC (Mon) by chris_se (subscriber, #99706) [Link]

> The main promise of zstd over the other options is faster decompression, so I think it would only be fair to include that in the comparison.

Sure, one could do that, and zstd is probably going to be faster when it comes to decompression. But my original point was not to bash zstd - I was replying to the statement that zstd is always better than xz and that there's no reason to use xz at all. My second response where I posted my measurements was a little hyperbolic to underline that point.

Personally, I do quite like zstd - and if you look at my table, using the standard compression algorithm, you can reduce a filesystem image of 554 MiB down to 182 MiB (~32% of the original size) within just 2 seconds, which is a lot faster than what many other tools can do. (~60 times faster than xz in its default settings.) I do think zstd is an excellent algorithm to use as a default when no further constraints have been applied, because the tradeoffs it has chosen are very sensible for many applications.

The only point I'm trying to drive home is that if you have certain constraints - such as that the compressed size is to be as small as reasonably possible - then zstd might not be the algorithm you want to use in all cases (probably depending on what kind of data you want to compress), and that rhetoric such as "always use zstd, xz is obsolete" is not helpful. And while the broader public now knows a lot more about the past hardships of xz maintenance, hindsight is always 20/20, and I don't think the problems there were immediately obvious to most people just using xz themselves. I think that after-the-fact statements such as "people should not have used xz anymore anyway" are extremely unhelpful - not only because it's easy to say so after the fact, but also because I do think xz has some advantages in some situations and will remain a good choice when constraints require it.

A backdoor in xz

Posted Mar 30, 2024 12:11 UTC (Sat) by bluca (subscriber, #118303) [Link] (10 responses)

It's needed to compress/decompress journal files and coredumps. The former is done also via a public sd_journal* set of APIs hence it is in libsystemd.so. In git main we switched to dlopen these libs only when needed - ie, when the sd_journal API is called _and_ it encounters a file compressed with the corresponding lib.

This will be the dependency tree of a fully-featured build of libsystemd in the next release:

build/libsystemd.so.0 (interpreter => None)
libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
ld-linux-x86-64.so.2 => /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

Compression libs and libgcrypt will all be dlopened on demand, if needed.

A backdoor in xz

Posted Mar 30, 2024 13:58 UTC (Sat) by pbonzini (subscriber, #60935) [Link] (5 responses)

I wonder however if simple and common functionality like notification and file descriptor retrieval belongs in the same public-facing library as reading the journal and the coredumps. Perhaps they should be moved out of libsystemd and into a two-file (.h and .c) copylib?

A backdoor in xz

Posted Mar 31, 2024 12:17 UTC (Sun) by bluca (subscriber, #118303) [Link] (4 responses)

It used to be, but it was merged, because it's just an unnecessary pain for developers to have to know multiple extremely similar libraries, and have to reason about which one to use and link to, etc etc.

The manager <-> service protocol is trivial, so the solution is to just reimplement it if that's all you need. I'll check whether we have some MIT-0 copy-paste ready examples, and if not add it to the documentation.

A backdoor in xz

Posted Apr 1, 2024 5:31 UTC (Mon) by mchapman (subscriber, #66589) [Link] (3 responses)

systemd used to provide a reference implementation (+ header). Perhaps something like this could be brought back?

A backdoor in xz

Posted Apr 1, 2024 11:02 UTC (Mon) by bluca (subscriber, #118303) [Link] (2 responses)

There will be a MIT-0 (so it can be copy/pasted with impunity) self-contained example in the documentation where the protocol is defined

A backdoor in xz

Posted Apr 2, 2024 17:16 UTC (Tue) by bluca (subscriber, #118303) [Link]

This is now published: https://www.freedesktop.org/software/systemd/man/devel/sd...

A backdoor in xz

Posted Apr 2, 2024 20:40 UTC (Tue) by himi (subscriber, #340) [Link]

A similar reference implementation in a few other common languages would be nice, too - with systemd it's gotten so easy to write system daemons in things like python that a copyable reference implementation would be quite helpful. It's simple enough that no one's bothered writing a library, but fiddly enough to do properly that rolling your own isn't always the best option.

A backdoor in xz

Posted Mar 30, 2024 17:21 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

Journal files and coredumps really are a case where zstd would be a better choice than lzma: it's nearly as good compression-wise but compression is *much* faster, and both journals and coredumps are compressed much more often than they're uncompressed. (For things like .xz distro artifacts this is the other way around, so spending loads of time for slightly better compression is often a good idea -- or would be if xz wasn't also much slower and more memory-hungry than zstd at decompressing!)

A backdoor in xz

Posted Mar 31, 2024 12:19 UTC (Sun) by bluca (subscriber, #118303) [Link] (1 responses)

xz, gz and std are all supported, with a compile-time option to choose which one to use.

A backdoor in xz

Posted Mar 31, 2024 13:54 UTC (Sun) by nix (subscriber, #2304) [Link]

In hindsight, of course they are! I should have checked. (I'm even using zstd on my own system, but of course I forgot. I, uh, blame the clocks changing. ... what do you mean they only changed after I made that comment?)

A backdoor in xz

Posted Mar 30, 2024 21:49 UTC (Sat) by MarcB (subscriber, #101804) [Link]

This looks like a major improvement, at the very least, the spill-over into other process address spaces will be prevented.

Let's hope distributions follow up on this and reduce the set of essential packages.

A backdoor in xz

Posted Mar 29, 2024 21:04 UTC (Fri) by wsy (subscriber, #121706) [Link] (5 responses)

My worst nightmare has happened. Given the complexity of modern systems, how can I sleep again?

A backdoor in xz

Posted Mar 29, 2024 21:26 UTC (Fri) by mb (subscriber, #50428) [Link] (3 responses)

> how can I sleep again

Become a gardener, lumberjack or hermit.

A backdoor in xz

Posted Mar 29, 2024 21:48 UTC (Fri) by joey (guest, #328) [Link] (1 responses)

I am at least 2 out of 3 and am still gonna lose sleep over this.

A backdoor in xz

Posted Mar 30, 2024 5:12 UTC (Sat) by marcH (subscriber, #57642) [Link]

There should be a button to nominate LWN's "comment of the week".

A backdoor in xz

Posted Mar 30, 2024 22:16 UTC (Sat) by detiste (subscriber, #96117) [Link]

You can't eat your cake and have it too... (not my words)

A backdoor in xz

Posted Mar 29, 2024 21:47 UTC (Fri) by atai (subscriber, #10977) [Link]

Set an idle timer. Idle for five minutes and suspend

A backdoor in xz

Posted Mar 30, 2024 0:19 UTC (Sat) by Trelane (subscriber, #56877) [Link]

Gentoo's notice: https://security.gentoo.org/glsa/202403-04
Gentoo's bug: https://bugs.gentoo.org/928134

A backdoor in xz

Posted Mar 30, 2024 0:25 UTC (Sat) by pabs (subscriber, #43278) [Link]

Another update, sounds like libarchive might have separate vulnerabilities also caused by the attacker:

https://boehs.org/node/everything-i-know-about-the-xz-bac...

A backdoor in xz

Posted Mar 30, 2024 0:29 UTC (Sat) by pabs (subscriber, #43278) [Link]

HN discussion:

https://news.ycombinator.com/item?id=39865810

A backdoor in xz

Posted Mar 30, 2024 0:30 UTC (Sat) by pabs (subscriber, #43278) [Link]

Related Mastodon thread:

https://hachyderm.io/@joeyh/112180715824680521

A backdoor in xz

Posted Mar 30, 2024 0:32 UTC (Sat) by pabs (subscriber, #43278) [Link]

A Gist containing a FAQ about the issue:

https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee...

A backdoor in xz

Posted Mar 30, 2024 1:01 UTC (Sat) by DimeCadmium (subscriber, #157243) [Link] (43 responses)

It's amazing (and miraculous) how quickly this was found. If the performance impact wasn't as bad, how long would it have flown under the radar?

Yet more reason to de-bloat instead of en-bloat with crap like systemd. xz-utils shouldn't have any relation to sshd, doesn't have any relation to upstream openssh, shouldn't be necessary to tell a service manager "hey I'm running!", telling a service manager "hey I'm running" shouldn't be necessary...

Just because you can do something doesn't mean you should and this is a perfect example of that.

A backdoor in xz

Posted Mar 30, 2024 1:44 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

> Yet more reason to de-bloat instead of en-bloat with crap like systemd.

xz is also pulled in by selinux, or some PAM modules (which SSH also uses).

Arguably, both need to go, and SSH needs to be rewritten in a safe language, with authentication handled by something like systemd instead of random dlopen()-ed modules.

A backdoor in xz

Posted Mar 30, 2024 8:18 UTC (Sat) by DimeCadmium (subscriber, #157243) [Link] (8 responses)

Gross, selinux; pretty sure UsePAM is also a patch (and it can at least be disabled in the config). The question though is not "what pulls it in" but rather "what pulls it in without adding value" because that's how you get lists of 100s of deps, any one of which is vulnerable to an attack like this.

A backdoor in xz

Posted Mar 30, 2024 10:54 UTC (Sat) by khim (subscriber, #9252) [Link] (1 responses)

> The question though is not "what pulls it in" but rather "what pulls it in without adding value"

Each patch add value to someone, or it wouldn't have existed. Sshd without PAM would be 100% useless to me because all machined that I use ssh with use authentication not supported by stock Debian.

Similarly someone who needs to pass certain certification needs selinux and so on.

That's the flip side of the story which made available open source in the first place: we have millions of users and even if 0.01% of them are developers it's enough to produce software for free.

Remove all that “crap” and suddenly there are not enough developers to drive that thing forward because there are not enough users.

There are no easy solution for that problem, unfortunately.

A backdoor in xz

Posted Mar 31, 2024 1:27 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link]

There's a difference between adding value to 1 person and adding value to everyone who uses some software, for example.

A backdoor in xz

Posted Mar 30, 2024 19:32 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

PAM still has value, it's still very useful for auditing and custom authentication in special environments.

These days, PAM can be mostly replaced by ephemeral SSH certificates for authentication. But it's still useful for auditing.

A backdoor in xz

Posted Mar 30, 2024 21:25 UTC (Sat) by apoelstra (subscriber, #75205) [Link] (3 responses)

I use pam_u2f extensively on my personal computers to use a Yubikey to authenticate my login and screenlocker. This usecase can't be replaced by ephemeral SSH certs because the goal is to talk to a physical U2F key which only speaks U2F.

A backdoor in xz

Posted Mar 31, 2024 0:47 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Is it for interactive logins or for SSH? It's definitely still needed for interactive logins, but they are also much less troublesome. But I don't think SSH needs them.

A backdoor in xz

Posted Mar 31, 2024 16:54 UTC (Sun) by apoelstra (subscriber, #75205) [Link] (1 responses)

Ah, yes, only for interactive logins. For SSH I use GnuPG's ssh-agent emulation support, whose mechanism I don't really understand.

A backdoor in xz

Posted Mar 31, 2024 18:51 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

ssh-agent (or its emulation) is basically just the public key authentication.

PAM was useful for custom authentication, such as LDAP-based auth or something similar. These days a fairly typical workflow is to use some kind of a daemon/utility on the developer's machine to get a temporary SSH certificate, and then just use this certificate to log in using the SSH.

A backdoor in xz

Posted Mar 31, 2024 1:27 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link]

Leftpad still has value, it's very useful for padding the left side of a string when you're too lazy to write 2 lines of code.

A backdoor in xz

Posted Mar 30, 2024 2:22 UTC (Sat) by dvdeug (subscriber, #10998) [Link] (9 responses)

So, do you run the kernel that compressed is 140MB of source (and 100MB of binary)? That's more than Linus's hard drive could hold when he started the project, and 140 times as large as version 1.0. Are you trying to debloat, or are you trying to score points on systemd?

A backdoor in xz

Posted Mar 30, 2024 8:16 UTC (Sat) by DimeCadmium (subscriber, #157243) [Link] (5 responses)

I have 35MB used in /boot, which includes 3 kernels among other things.

A backdoor in xz

Posted Mar 30, 2024 8:24 UTC (Sat) by niner (subscriber, #26151) [Link] (3 responses)

What about kernel modules? They are usually not in /boot.

A backdoor in xz

Posted Mar 30, 2024 21:49 UTC (Sat) by dmoulding (subscriber, #95171) [Link]

As just a random data point, my kernel has all functionality I need built-in to it. I don't even enable loadable module support. The compressed bzImage in /boot is 12M. The uncompressed vmlinux is 41. This is for a desktop with everything that entails (DRM, nouveau, bluetooth, USB, camera/video, audio, etc.)

A backdoor in xz

Posted Mar 31, 2024 1:29 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link] (1 responses)

/lib/modules $ du -hs `uname -r`
64M 6.6.13-gentoo

I'm not sure if you're aware of `make menuconfig`, but unlike systemd, you actually CAN effectively turn off parts of the kernel that you don't need.

A backdoor in xz

Posted Mar 31, 2024 1:31 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link]

Oh and here's the actual modules BTW:

700K 6.6.13-gentoo/misc/vboxdrv.ko
48K 6.6.13-gentoo/misc/vboxnetflt.ko
20K 6.6.13-gentoo/misc/vboxnetadp.ko
59M 6.6.13-gentoo/video/nvidia.ko
8.0K 6.6.13-gentoo/video/nvidia-peermem.ko
1.7M 6.6.13-gentoo/video/nvidia-modeset.ko
16K 6.6.13-gentoo/video/nvidia-drm.ko
2.5M 6.6.13-gentoo/video/nvidia-uvm.ko

A backdoor in xz

Posted Mar 30, 2024 15:10 UTC (Sat) by dvdeug (subscriber, #10998) [Link]

To carry on with niner's point, /boot/vmlinuz-6.7.9-amd64 may be nine megabytes, but /lib/modules/6.7.9-amd64/ is over a hundred. You could read through the kernels of Unix v6 (the Lions book), or xv6, or Minix. Even twenty years ago, the kernel was so big that the SCO mess turned up a copy of Unix malloc buried in kernel that no one had noticed, even though it should have been replaced with standard kernel functions. Should we use a tighter (less functional) kernel that's actually readable? I don't want to give up a lot of the features I use, but there's certainly no one with the headroom to completely understand 140 MB of source code.

A backdoor in xz

Posted Mar 30, 2024 13:24 UTC (Sat) by pawel44 (guest, #162008) [Link] (2 responses)

Kernel is not pulling third party dependencies.

A backdoor in xz

Posted Mar 30, 2024 14:38 UTC (Sat) by smurf (subscriber, #17840) [Link] (1 responses)

Well, not when building it. Running it is another matter, as it pulls in a heap of pre-built binaries (firmware) with poorly-documented provenance.

A backdoor in xz

Posted Mar 30, 2024 15:05 UTC (Sat) by marcH (subscriber, #57642) [Link]

Running it is another matter, as it pulls in a heap of pre-built binaries (firmware) with poorly-documented provenance.

Even worse: it does not even _log_ what it loaded! I usually carry this hack:

--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -562,7 +562,7 @@ fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
                size = rc;
                rc = 0;
 
-               dev_dbg(device, "Loading firmware from %s\n", path);
+               dev_warn(device, "XXXX Loading firmware from %s\n", path); 
                if (decompress) {
                        dev_dbg(device, "f/w decompressing %s\n",
                                fw_priv->fw_name);
@@ -924,6 +924,10 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
                fw_log_firmware_info(fw, name, device);
        }
 
+       dev_warn(device, "XXXX request-firmware name=%s, ret=%d\n",   name, ret);

        *firmware_p = fw;
        return ret;
 }

A backdoor in xz

Posted Mar 30, 2024 2:56 UTC (Sat) by himi (subscriber, #340) [Link] (22 responses)

> Yet more reason to de-bloat instead of en-bloat with crap like systemd. xz-utils shouldn't have any relation to sshd, doesn't have any relation to upstream openssh, shouldn't be necessary to tell a service manager "hey I'm running!", telling a service manager "hey I'm running" shouldn't be necessary...

Services telling the service manager "hey, I'm running!" makes it much /much/ easier to have robust systems. I've run into all sorts of problems getting multiple dependent services working together reliably because of timing issues during startup - an early service happens to take a second longer to come up than expected, a dependent service comes up at just the right time that it doesn't get a connection refused and instead has to wait for a timeout, and anything that happens afterwards is just broken (without manual intervention). Having that first service /explicitly/ tell the service manager "I'm ready to accept connections" avoids that kind of thing /reliably/, without needing to throw in random sleeps that avoid 99.9% of the problems but make everything take five times as long as it should, and which still don't address that 0.1% tail.

Sure, I could implement that coordination in the services . . . at least for the ones I've written. Otherwise I'd need to what, wrap dependent services in something that /does/ handle the coordination? And what about cleanly shutting down or restarting - obviously I'd need the wrapper to handle that, too. And I'd need to wrap /everything/ somehow so that starting and stopping the whole stack in the correct order would work, and handle errors in the startup process sensibly, and and and . . .

And guess what - handling dependencies between services sensibly is pretty much the bare minimum for any reasonable service manager. Supporting that in some kind of "lets avoid systemd at all costs", "I'm not a service manager, just a thin coordination wrapper, no, really" bit of code is at best fiddly and difficult to do well, and at worst ends up being a reimplementation of a significant chunk of systemd. And even if you do all that, on a *nix system you'll ultimately end up having to delegate at least /some/ things to pid 1, particularly if you want a robust and reliable system (even if it's just tidying up zombie processes).

There's a minimum level of core complexity in any functional system - for a system based on a modern Linux kernel and the current standard Linux userspace, that minimum level of core complexity is close enough to what systemd provides that desperately trying to avoid all of systemd's "bloat" just means you're making things both less functional /and/ less robust. You /can/ make that kind of trade-off work (Alpine does a decent job of it, since they're targeting fairly constrained use cases), but pretending that there's no trade-off happening is just delusional.

But it's also irrelevant in this case, because lots of things other than systemd pull in liblzma - including pam, which was a critical component of a functional sshd on Linux /long/ before systemd became the default. The minimum level of core complexity back when we were all relying on SysV init was still plenty broad enough for a sufficiently motivated and well resourced attacker to find exploitable weaknesses, and there were far fewer options available to harden systems against that kind of attack.

A backdoor in xz

Posted Mar 30, 2024 8:19 UTC (Sat) by DimeCadmium (subscriber, #157243) [Link] (11 responses)

Til they tell the service manager "I'm running!" just before failing. I've had that happen several times, in fact. The only *actual* solution for that problem is monitoring. Notify-by-socket is precisely equivalent to notify-by-fork in terms of reliability.

A backdoor in xz

Posted Mar 30, 2024 9:57 UTC (Sat) by cesarb (subscriber, #6266) [Link] (4 responses)

Another solution is the watchdog: the service has to tell the service manager "I'm still running!" periodically, otherwise it's treated as failed. Then the main effect of the initial "I'm running!" is that the timeout before it is given by TimeoutStartSec, and the timeout after it is given by WatchdogSec, which can be shorter (allowing for services which are slow to start, like heavy Java-based servers).

A backdoor in xz

Posted Mar 31, 2024 1:33 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link] (3 responses)

Til it tells the service manager "I'm still running" but never calls accept again. Notify-by-socket is precisely equivalent to notify-by-fork in terms of reliability.

The solution you are looking for is *monitoring*.

A backdoor in xz

Posted Mar 31, 2024 1:45 UTC (Sun) by intelfx (subscriber, #130118) [Link] (2 responses)

> Notify-by-socket is precisely equivalent to notify-by-fork in terms of reliability.

It might be "equivalent" in an information-theoretical sense (everything that can be achieved with one, is also achievable with the other), but it's absolutely not equivalent in _practical reliability_.

Setting up a proper "notifying" double-fork (which, I remind you, means that the immediate child has to wait for the grandchild to initialize and only then exit, because in most cases the initialization must be completed in the grandchild) is tenfold more _complicated_ and _easier to get wrong_ than simply writing a line into a pre-existing socket that the supervisor has prepared for you.

Even more: all known cases of proper notifying double-fork implementatoin involve creating a temporary pipe or socket between the child and the grandchild, precisely for the reasons described above. As such, we are choosing between a notify-by-socket implemented _once_ and a notify-by-socket implemented _over and over again_ in each daemon. The choice must be obvious, unless you specifically have an irrational axe to grind against systemd.

A backdoor in xz

Posted Mar 31, 2024 6:22 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link] (1 responses)

> As such, we are choosing between a notify-by-socket implemented _once_ and a notify-by-socket implemented _over and over again_ in each daemon. The choice must be obvious,

Indeed it must be, considering that we are discussing the result of everyone sharing a single implementation of it.

A backdoor in xz

Posted Mar 31, 2024 12:25 UTC (Sun) by bluca (subscriber, #118303) [Link]

No, most of us are discussing the result of a multi-year-long sophisticated social engineering attack that preyed on underfunded and overworked unpaid maintainers to inject a complex backdoor. Yes, a handful of people are missing the wood for the trees because they are unable or unwilling to run a simple command to check the attack surface gained by backdooring xz:

$ apt-cache rdepends liblzma5 | wc -l
354

If it hadn't been libsystemd in the middle of the dependency chain, it would have been something else. The exploit was primed and ready to add more backdoors for other arbitrary workflows, with pre-prepared and unused "test files" signatures that we'll now never know what would have attacked.

A backdoor in xz

Posted Mar 30, 2024 14:29 UTC (Sat) by smurf (subscriber, #17840) [Link] (2 responses)

One does not exclude the other. In fact many of my monitoring scripts actively check that the service in question hasn't legitimately been shut down before complaining (too) loudly.

You cannot do a "has legitimately been shut down" check without systemd. (Well, OK, of course I could use or write some other code that does this job, but why would I want to replace one mostly-coherent, widely-used presumed-safe software package with five less-widely-used and poorly-integrated ones? sysV init scripts are of no help here)

A backdoor in xz

Posted Mar 31, 2024 1:33 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link] (1 responses)

Yes you can? OpenRC has done it for longer than systemd has existed ffs

A backdoor in xz

Posted Apr 1, 2024 14:06 UTC (Mon) by farnz (subscriber, #17727) [Link]

I used OpenRC before I used systemd; it does not, as far as I can find (and even today) offer a proper lifecycle check; in particular, the only queries you can ask it are "is this service running", "is this service known and shut down", or "has this service crashed", whereas systemd adds "is this service running but in the process of shutting down" and "is this service running but in the process of restarting" to that list, which is essential for automated remediation of faults - you know that if a service is restarting, a fault is OK, while if a service is shutting down, you should determine if that shutdown is expected, or if you need to alert a human.

A backdoor in xz

Posted Mar 30, 2024 14:48 UTC (Sat) by dvdeug (subscriber, #10998) [Link] (2 responses)

> Til they tell the service manager "I'm running!" just before failing.

Which is a bug; they should complete all checks that make them fail before reporting success. Yes, bugs are a reality.

> The only *actual* solution for that problem is monitoring. Notify-by-socket is precisely equivalent to notify-by-fork in terms of reliability.

No, there exists many cases where a fork happens and then the program fails before it would have notified the service manager it was successfully running. By the same logic, monitoring is precisely equivalent to notify-by-fork in terms of reliability; monitoring programs can fail to notice a service no longer working as well, except that they add false positives and can report that a system has failed when it's been properly shutdown or had a temporary glitch, as from system overload.

A backdoor in xz

Posted Mar 31, 2024 1:34 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link] (1 responses)

> Which is a bug; they should complete all checks that make them fail before reporting success. Yes, bugs are a reality.

Indeed. But *IT IS THE SAME BUG WHETHER YOU'RE USING SYSTEMD'S NOTIFICATIONS OR FORKING*

I don't understand why I have to explain that so many times only to hear the EXACT SAME (inane) ARGUMENT again.

A backdoor in xz

Posted Apr 1, 2024 11:29 UTC (Mon) by HenrikH (subscriber, #31152) [Link]

for that particular deamon yes, but there are less that have this bug than where the double fork is not reliable (which is 100% of the double fork cases).

A backdoor in xz

Posted Mar 30, 2024 23:34 UTC (Sat) by mchehab (subscriber, #41156) [Link] (9 responses)

> Services telling the service manager "hey, I'm running!" makes it much /much/ easier to have robust systems.

System V init systems are typically a lot more robust than systemd ones, as:

- the order where servers start is fixed; no risk of starting a process too early;
- jobs are started in sequence. Things like Network were started before network daemons like sshd, apache, etc;
- no parallel jobs during init/shutdown time;
- critical jobs that should never be stopped could also be added to the /etc/inittab. They were respawned if something bad happens and the process ends dying.

Also, the "hey, I'm running" task is really simple: if a process has problems, it shall die. PID 1 shall detect it and take the appropriate action when this happens. Modifying the daemon's source, specially with OOT patches sounds a very bad idea.

So, while systemd offers lots of flexibility, in terms of system's robustness, simpler usually means more stable and more reliable. I'm yet to see a systemd-based system more reliable than a SysV one. At best, it might be equivalent in terms of robustness.

A backdoor in xz

Posted Mar 31, 2024 1:14 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> - jobs are started in sequence. Things like Network were started before network daemons like sshd, apache, etc;

What about wireless or VPNs?

> Also, the "hey, I'm running" task is really simple: if a process has problems, it shall die.

A Java server that I have for telephony takes 2 minutes to start up. How would you detect that?

There's also a problem with double-forking. The only process that can detect the death of a double-forked server is PID 1, and in classic SysV all it did was to reap the PID. Ditto for inittab - it can't detect the death of double-forked processes.

A backdoor in xz

Posted Mar 31, 2024 1:38 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link] (2 responses)

> What about wireless or VPNs?

What about it? Both work fine for me, I have 3 VPNs and occasionally wireless (tethering via my phone).

> A Java server that I have for telephony takes 2 minutes to start up. How would you detect that?

Well, for one thing, I wouldn't use Java, and I wouldn't use a server that takes 2 minutes to start up. Other than that, there are plenty of solutions for this that you could easily implement in sysvinit (you can run whatever you want whenever you want, after all, it's just a shell script); OpenRC actually handles it natively (and has since before systemd existed).

> The only process that can detect the death of a double-forked server is PID 1

That's not true (PR_SET_CHILD_SUBREAPER).

> in classic SysV all it did was to reap the PID

How is that an argument for systemd?

> Ditto for inittab - it can't detect the death of double-forked processes

Huh? inittab is a config file.

A backdoor in xz

Posted Mar 31, 2024 3:51 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> What about it? Both work fine for me, I have 3 VPNs and occasionally wireless (tethering via my phone).

Now try to make sure that the daemon does not come up until at least one network interface is up. Or until the VPN connection is established.

> That's not true (PR_SET_CHILD_SUBREAPER).

That's true in classic SysV. The subreaper was introduced only in Linux 3.4

> Huh? inittab is a config file.

If you're talking about SysV "simplicity", then you should at least learn it. Classic inittab supports respawning processes on death (action=respawn).

A backdoor in xz

Posted Mar 31, 2024 6:18 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link]

> Now try to make sure that the daemon does not come up until at least one network interface is up. Or until the VPN connection is established.

Done and done.

> That's true in classic SysV. The subreaper was introduced only in Linux 3.4

... okay?

> If you're talking about SysV "simplicity", then you should at least learn it. Classic inittab supports respawning processes on death (action=respawn).

I know it, thanks. That's not a separate program. That's part of init. Controlled by the configuration file, inittab. If you're going to act like you know something better than someone else, then you should at least learn it.

A backdoor in xz

Posted Mar 31, 2024 1:58 UTC (Sun) by himi (subscriber, #340) [Link] (4 responses)

If SysV init systems are actually more robust than systemd ones, it's because they're much simpler - your description captures that simplicity really neatly, in fact.

Though I don't actually believe that claim - I've had plenty of SysV init based systems that were massive pains in the neck to deal with, while the majority of the systemd based systems I've managed have been fairly benign and well behaved, despite doing a whole lot more. Also, the issues I have to deal with generally have nothing to do with systemd, and when they /are/ issues with the unit configuration they've been much easier to resolve than similar issues with init scripts ever were.

Oh, and writing a service targeting a SysV init environment is /far/ more of a pain than writing the same thing targeting a systemd environment - in fact, it's ridiculous how easy it is to target systemd with a basic service. No daemonising, no futzing around with logging targets, no pid files, even the kind of basic intra-service coordination you're pooh-poohing is ridiculously simple. And no mess of spaghetti shell code init script! Most of the time the unit file for a basic service is five or ten lines, and they're simple and declarative with no complex logic - infinitely easier to write and debug. Even when you need to set up complex interdependencies it's generally simple to configure and relatively easy to debug - it can be fiddly, but that's mostly inherent to the problem, rather than an artefact of systemd's implementation.

SysV init worked okay in its day, though it was always a bit of a pain. But it's long /long/ past the time when people should have recognised how much of an improvement systemd is, in pretty much every way. Eulogising the past is all well and good, but actual current performance matters a whole lot more in the real world than sepia-tinted memories of past glories.

Here too

Posted Mar 31, 2024 2:05 UTC (Sun) by corbet (editor, #1) [Link]

Fighting the old systemd wars yet again is not going to help us address this kind of attack. Please, let's not do that.

A backdoor in xz

Posted Mar 31, 2024 3:13 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link] (2 responses)

Indeed, simpler is better. Simpler is easier to secure, simpler is easier to develop, simpler is easier to maintain (i.e. run a system with it), simpler is easier to modify, simpler is easier to troubleshoot, simpler is easier to understand.

A backdoor in xz

Posted Mar 31, 2024 13:00 UTC (Sun) by pizza (subscriber, #46) [Link]

> Indeed, simpler is better.

You can have your remberberries, but the rest of us have to deal with the real world.

...The real world isn't simple, hasn't ever been, and has only trended towards higher complexity.

A backdoor in xz

Posted Apr 1, 2024 14:08 UTC (Mon) by farnz (subscriber, #17727) [Link]

FWIW, by porting a system from SysV init to systemd, I was able to close several hard-to-reproduce bugs, since systemd's extra complexity allowed me to remove a ton of complexity from the various scripts that started components of the system, and at least some of that complexity turned out to have race conditions in it that systemd did not have.

There is a huge advantage to one implementation of something done well replacing (in my case) 5 different implementations, all with different bugs.

A backdoor in xz

Posted Mar 30, 2024 2:10 UTC (Sat) by helsleym (guest, #92730) [Link] (11 responses)

Seems like release builds, at least, ought to be isolated from binary blobs whenever possible. The build system of nearly any project could probably delete *all* test infrastructure (including testcase data), documentation directories (could contain binary "images"), etc. before starting the (release) build. Committing binary data to source respositories should also be seen as deeply taboo. Plenty of projects have been pretty lax about that or even chided others for being wary of binary blobs. The fact is: it's not "source" after all and it makes finding this kind of stuff even less likely.

A backdoor in xz

Posted Mar 30, 2024 4:01 UTC (Sat) by himi (subscriber, #340) [Link] (10 responses)

The difficulty with doing that is the fact that you /want/ to be able to run the test suite as a standard part of the package build process, because you want to be able to say with confidence that the code you're wrapping up in that package works as intended when built as part of your distribution. And you /really/ want to be able to run the test suite using exactly the same source distribution that you're building your packages from, in the same system context the packages will be installed into, because otherwise you (and your users) have no guarantees about the validity of those tests. A properly done test suite will include test cases that exercise any security-critical code paths, if you're not testing the same code with the same set of libraries and environment as you expect them to be used in, then those security-critical code paths may not be tested meaningfully.

I'm not sure how practical it would be to completely avoid having binary blobs of test data in the repository, particularly for something like a compression library - the only real alternative would be to have code that could /generate/ the test data from some kind of human-readable format. I'm sure it would be /possible/ - most test cases would require well-formed or very selectively malformed examples of a well-defined set of binary data structures, which you'd specify in your human-readable format and then "compile" to the actual binary test data mechanically. How practical that would actually be I don't know . . . but I'm almost certain that the effort required to change existing test suites built around binary blobs over to using generated data would be prohibitive, particularly for projects like xz, which appears to have been vulnerable to this attack precisely because of the lack of developer capacity.

Sanitising the build environment somehow while keeping the full original source distribution available would probably be a saner option - maybe do the build in a sanitised context, snapshot the results, and then re-extract the source along with anything required for testing? But then you run into the question of how to sanitise the build environment, which would probably be a giant pain, and decidedly distribution-specific . . . Perhaps you could throw compute resources at the problem and just run builds with more and more files deleted until it failed to produce an identical binary, at which point you have your baseline for that particular package? CPU time on build farms is free, isn't it?

Longer term the developer community probably needs to come up with a set of best practice guidelines about how to mitigate these kinds of issues, and tooling to make those best practices nice and easy to implement; then all we'll need to do is find the developer capacity to move existing codebases over to those best practices . . .

A backdoor in xz

Posted Mar 30, 2024 14:35 UTC (Sat) by smurf (subscriber, #17840) [Link] (9 responses)

> I'm not sure how practical it would be to completely avoid having binary blobs of test data in the repository

Nobody requested that. The problem here is a binary blob that (a) is executed as part of the test (b) isn't even checked in (c) is able to modify the built artefacts.

All three of those are red flags. Preventing any one of them would have disabled this attack. In fact, checking out the git repo (with its immutable commit ID and all that) instead of downloading a tarball which is only loosely (too loosely as it turns out) associated with the release in question could, and should really, have prevented this from affecting Debian (and probably others).

A backdoor in xz

Posted Mar 30, 2024 20:10 UTC (Sat) by mjg59 (subscriber, #23239) [Link] (1 responses)

This is inaccurate. The backdoor code is checked in, and isn't able to modify the artifacts in itself. What isn't checked in is a backdoored m4 macro that injects code into configure, which in turn extracts further code from the test files and injects those into the build. It's kind of expected that configure is able to influence the build process, and compressor test files are inherently going to be binaries, so really the only red flag was the difference between the git repo and the tarball - a difference that's largely expected, given how autotools works.

A backdoor in xz

Posted Mar 30, 2024 23:56 UTC (Sat) by chrisoofar (guest, #170494) [Link]

There is a lot of valid conversation around the viability of the autoconf build stack that should not be discounted, and it's time to put it to bed just like programmers left JCL job cards.

And if we're going to insist on a build tool with its own scripting language, then for the love of God and all that is holy pick or construct something that enforces readability with an iron fist.

A backdoor in xz

Posted Mar 31, 2024 1:00 UTC (Sun) by himi (subscriber, #340) [Link] (6 responses)

We want to avoid binary blobs in the source repository because we want the entire repository to be human readable, and therefore at least notionally subject to human review - it's not ridiculous to consider the practicality of the idea. Particularly for cases like this where it's a binary file format which obviously needs binary data as input for at least some of its test cases. It's definitely /possible/, and it may be desirable in the abstract, but how practical it'd be, how much value it'd add, and whether it'd be worth the disruption are the questions that need answering. Certainly it would mitigate against this particular attack, and raises the bar for any attack that wanted to hide a payload in plain sight, but it'd need careful consideration before being recommended as a matter of course.

There are two reasonably simple mitigations that /could/ be done (and probably should be done) to mitigate against this kind of attack: building from reproducible and traceable source (i.e. a git checkout or a git archive with enough metadata that the contents can be verified against a trusted git repo); and doing the actual build in a sanitised environment with no access to anything not required for the build. The obvious people to implement these mitigations are the distros, but they should probably be standard practise for anyone who's creating binary distributables.

'git archive' already does most of what we'd want, but it'd need to add metadata to make the archive contents more readily verifiable, and probably a bunch of smaller tweaks to make it reliable and reproducible for this use case; it probably also needs a properly standardised output format, to avoid issues like we saw with github's changing "release archives". Git would also need to complete the move away from vulnerable digest algorithms, though in practise the "vulnerable" digests are probably still a sufficiently high barrier against almost all attackers. Using a repo checkout directly /isn't/ a good idea - unless you set it up carefully you'd be exposing everything in the repo to your build; this would also make any hash collision attack against the digest algorithm far more valuable to an attacker. An archive mostly avoids those issues, as well as being more easily redistributable.

A sanitised package build environment is something that would mostly be implemented by the distros rather than the developers. It /should/ be pretty simple to implement, even if you want to be able to run the test suite using the newly built binaries: make an initial copy of the source tree, sanitise it, do the build, then retrieve the build artefacts; make another clean copy of the source tree, drop the build artefacts back in, then run the test suite; assuming it passes, package up the build artefacts and you're done. Again, it mitigates against a replay of this attack, and raises the bar a /lot/ for future attacks that want to use the build system to inject an obfuscated payload that's been added to the source distribution, but it does so in a way that can be applied to just about all package builds without needing to change the entire world.

The biggest challenge is how to sanitise the build environment - the best approach would have the developers change the organisation of the source tree to make it easier to sanitise; without that, it'd be a long hard slog to examine each and every package to determine the bare minimum that was required for a reproducible build. I imagine a lot of the grunt work could be automated, but the results would still need to be validated by a human, and re-validated each time the source tree changed.

More broadly, though, we need to adjust our threat models to include potentially malicious maintainers. The two mitigations I've discussed here could address some of that - visibility and verifiability of source distribution, and reducing the attack surface presented by the build environment. There's going to be lots of other threats that come to light as we come to grips with this new threat model, though - we need to be careful not to focus too closely on the details of this particular attack.

A backdoor in xz

Posted Mar 31, 2024 14:06 UTC (Sun) by nix (subscriber, #2304) [Link] (5 responses)

> We want to avoid binary blobs in the source repository because we want the entire repository to be human readable, and therefore at least notionally subject to human review - it's not ridiculous to consider the practicality of the idea. Particularly for cases like this where it's a binary file format which obviously needs binary data as input for at least some of its test cases. It's definitely /possible/, and it may be desirable in the abstract, but how practical it'd be, how much value it'd add, and whether it'd be worth the disruption are the questions that need answering. Certainly it would mitigate against this particular attack, and raises the bar for any attack that wanted to hide a payload in plain sight, but it'd need careful consideration before being recommended as a matter of course.

I'd say that in the specific case of xz, it should be generating corrupted test files etc using *xz itself*, and then buggering them using existing tooling. If it had done that routinely it would be obvious that a giant binary lump claiming to be a corrupted xz file was up to no good.

The hard part is testing *non*-corrupted files. You can't reliably construct *those* with the built xz, because if the built xz is buggy it's going to generate a messed-up file rather than the non-corrupted file that is desired (and you presumably want to detect cases where the compressor and decompressor are buggered-up in the same way). Not sure how to fix that: you can't use the xz already on the system because it might be similarly buggered-up...

A backdoor in xz

Posted Mar 31, 2024 19:05 UTC (Sun) by cesarb (subscriber, #6266) [Link] (4 responses)

In my opinion, it's the opposite: non-corrupted files could be described using a custom human-readable assembly-like language which represents the primitives allowed in the compression format (something like "LITERAL 03 AA BB CC TREE ..."), while corrupted files could be crazy files found by users which happened to cause older versions of the software to misbehave, and which were added as a regression test.

> You can't reliably construct *those* with the built xz, because if the built xz is buggy it's going to generate a messed-up file [...] and you presumably want to detect cases where the compressor and decompressor are buggered-up in the same way

That is easy to avoid, you just need to add an "expected hash" to check that the compressor generated the correct data for the decompressor to test. But that doesn't help when you want to test that the decompressor remains compatible with data generated by *older* versions of the compressor (which the current code no longer generates), or even alternative compressor implementations. There's a lot of flexibility in compressor output, so it's not unusual for it to change.

A backdoor in xz

Posted Mar 31, 2024 19:41 UTC (Sun) by nix (subscriber, #2304) [Link]

Yeah, solving this is hard.

In a similar area, for a long time we tried to make the libctf and ld.ctf testsuites in binutils work the same way the DWARF ones mostly do: describe a binary CTF dict via a .S file and just assemble it. But a mixture of there being no cross-arch-portable way to represent arbitrary binary data in .S files (.long, .byte4 etc, are all arch-dependent) and a desire to always test the latest CTF format except in a few error cases meant that nearly all of the tests end up compiling a .c file with -gctf just to get its CTF, then running the tests on that. It turned out to be much nicer to work with that, and much easier to see what the tests were actually *doing* as well: .S files are a pretty opaque way to describe a test input! (For corrupted CTF tests, the same problem arises, though at only a few dozen bytes each they're hardly likely to be malicious. There's just no room.)

A backdoor in xz

Posted Apr 1, 2024 6:52 UTC (Mon) by himi (subscriber, #340) [Link] (2 responses)

> In my opinion, it's the opposite: non-corrupted files could be described using a custom human-readable assembly-like language which represents the primitives allowed in the compression format (something like "LITERAL 03 AA BB CC TREE ..."), while corrupted files could be crazy files found by users which happened to cause older versions of the software to misbehave, and which were added as a regression test.

This is mostly what I was thinking of - have a human-readable language designed to specify the binary formats, and use that to generate the test data. It would work for both good and bad test data, though - rather than simply copying "known bad" test files for testing, you'd pull them apart and write a spec for the test case that generated the same kind of bad data. Doing that would actually specify the badness in great detail, far more so than simply having a binary test file that happens to break things; it could also help drive highly targeted fuzzing whenever particular failures were identified.

The biggest problem I see is that for something like a compression format the difference in complexity between a specification of the binary format and it's possible contents and an actual implementation of the compression algorithm probably isn't all that big, and the specification is certainly going to be subject to as many bugs as the code it's supposed to be testing. This is why I'd hesitate to suggest trying to make this sort of thing into any kind of generalised "best practise" - there are probably going to be too many cases where the payoff just isn't worth the effort required.

That said, a compression algorithm and its on-disk format are probably just about the worst case scenario - I expect a lot of software that reads or writes a binary format would have much less difficulty creating this kind of test data spec. And in a more generalised sense the idea of generating test data rather than simply embedding it in the repository is something that's probably worth encouraging, along with tooling that would support it . . .

A backdoor in xz

Posted Apr 1, 2024 12:48 UTC (Mon) by pizza (subscriber, #46) [Link] (1 responses)

> This is mostly what I was thinking of - have a human-readable language designed to specify the binary formats, and use that to generate the test data.

So... you have this human-readable language generate a malicious file that contains the payload for an exploit. What have you gained here?

The problem is that the binary data is "hostile", not "how the binary data was generated".

A backdoor in xz

Posted Apr 2, 2024 3:16 UTC (Tue) by himi (subscriber, #340) [Link]

If we can't simply trust that the contents of the repository aren't malicious, we need to be able to independently verify that they're not malicious. The point of a human-readable specification of the binary data is to try and make it possible for a human to read it and say "the binary version of this spec won't be malicious". Is that possible? In the general case I don't think so, but I'm pretty sure there are going to be a good number of cases where it /is/ possible, for someone with the requisite knowledge of the format. I'm suggesting that where it's possible, it might be worth trying to do, to mitigate against at least some of the risks posed by malicious binary blobs in source repositories.

As I've said, even if this idea might work in some cases I suspect it wouldn't be viable for xz or similar, simply because of the nature of compression algorithms. But if you can make "has undefined binary files as part of the test data set" into a code smell that gets people to take a closer look then you're raising the bar for sneaking a malicious payload into a repository.

A backdoor in xz

Posted Mar 30, 2024 3:58 UTC (Sat) by pabs (subscriber, #43278) [Link]

https://infosec.exchange/@lcamtuf/112180485473559371

A backdoor in xz

Posted Mar 30, 2024 4:01 UTC (Sat) by pabs (subscriber, #43278) [Link]

https://bugs.debian.org/1068024

A backdoor in xz

Posted Mar 30, 2024 7:06 UTC (Sat) by mb (subscriber, #50428) [Link] (16 responses)

All maintainers should now keep in mind that after fixing this vulnerability in xz it is not over.

This is just the start of it.

We all need to be aware that this person, or any other person, will start an equivalent attack on any other project, soon.
Or has already started it...

We maintainers need to
- carefully review submissions.
- reject any submission that we don't fully understand. It's the job of the submitter to make it understandable! It's the job of the submitter to get a patch up to the projects quality standards.
- publicly ask for help, if we are unable to fully maintain a project that people depend on. In the meantime do not apply non-trivial patches. It's not a shame if we are - for whatever reason - temporarily or permanently unable to maintain a project. Users just need to know about that.
- never discuss things in private that belong into public discussions.

A backdoor in xz

Posted Mar 30, 2024 13:03 UTC (Sat) by pizza (subscriber, #46) [Link] (1 responses)

> - publicly ask for help, if we are unable to fully maintain a project that people depend on

Except, as demonstrated, you can't trust those that are stepping up to supposedly "help" you. You have to vet them first, which brings us full circle.

A backdoor in xz

Posted Mar 30, 2024 17:36 UTC (Sat) by mb (subscriber, #50428) [Link]

I wrote "publicly" for a reason.

A backdoor in xz

Posted Mar 30, 2024 14:50 UTC (Sat) by cpanceac (guest, #80967) [Link] (12 responses)

In the mean time i have a problem with big companies using open source components without significantly investing in them (if at all). Their products get in all kind of commercial products used by people around the world. I believe it will make a big difference if they put their money in supporting these projects instead of assuming that everything is fine.

A backdoor in xz

Posted Mar 30, 2024 17:17 UTC (Sat) by rra (subscriber, #99804) [Link] (11 responses)

Corporations are legally sociopathic and have no structural incentive to do any such thing. Problems like this that result from this gap can be socialized in ways that they still don't have to pay for. Other people will always clean up their messes for them.

We have a social mechanism to require private organizations to support infrastructure that everyone needs, in order to avoid free rider problems like this. It's called taxes. Everyone hates that answer, because it requires making decisions collectively as a society (well, to be more accurate, multiple societies) and holding people responsible for participating in and funding the society that they are part of, but no one has a better alternative. Begging corporations to be kind, benevolent benefactors isn't going to work. Appealing to self-interest when it's quite clear they can socialize their losses and make more money by being aggressively selfish also isn't going to work.

An interesting model worth considering is the way countries like the UK compensate the authors of books based on how much circulation they receive in libraries.

A backdoor in xz

Posted Mar 30, 2024 21:55 UTC (Sat) by kleptog (subscriber, #1183) [Link] (10 responses)

> We have a social mechanism to require private organizations to support infrastructure that everyone needs, in order to avoid free rider problems like this. It's called taxes.

There's also a third way: regulations. The food and drink we consume is safe not because companies are nice or because we pay for it all via taxes. It's safe because we have regulations and enforce them. Stuff like electricity grids, water/gas networks, telecommunication grids are these days typically built on a user-pays principle: those who use the service pay for the infrastructure. In the past these were originally built by (local) governments with taxes but that's just not very inefficient. Using regulations to enforce service levels while letting the user pay for them can work much better (note: this is not the same as privatisation). (Caveat: requires people making sensible regulations.)

I'm not convinced getting open-source infrastructure paid for via taxes is a viable model. Taxes are a very blunt instrument because they completely divorce the people who use the product from the people who pay for it. This is fine for things where we expect social solidarity like social security. I don't think it makes sense for open-source software.

This is the moment where we need to point out to all those big companies: can you imagine if this had gone undetected and ended up in one of your released products? What would that have cost you and how much is it worth now to prevent that?

A backdoor in xz

Posted Mar 30, 2024 23:07 UTC (Sat) by ejr (subscriber, #51652) [Link] (5 responses)

Enforcement is funded by taxes.

A backdoor in xz

Posted Mar 31, 2024 22:07 UTC (Sun) by Wol (subscriber, #4433) [Link] (4 responses)

> Enforcement is funded by taxes.

Why? What you *should* do is provide a guaranteed level of funding for your enforcement agency that enables it to function, monitor the market, and carry out a low level of enforcement. That enforcement enables the agency to recover costs, which means they can expand enforcement in line with expanding abuse/breaches. The result again *should* be a commercial decision that breaking the rules is not wise - either you're a lone offender who will get targeted, or the market is generally offending and the fines/costs will enable the enforcement agency to rapidly expand ...

Cheers,
Wol

A backdoor in xz

Posted Apr 1, 2024 17:49 UTC (Mon) by apoelstra (subscriber, #75205) [Link] (3 responses)

>That enforcement enables the agency to recover costs

In the US at least, I don't believe any agency works this way. Instead any fines or other enforcement-related payments go to the goverenment's general fund. Money is fungible so in principle this could make an agency cost-neutral, but it has no effect on the agency's budget so they aren't incentivized to try.

If any agency *were* incentivized to levy fines, because their operating budget had to come out of the fines, this would be a perverse incentive for them to just levy fines willy-nilly. Much like the "speed traps" operated by local police agencies near the end of the month.

A backdoor in xz

Posted Apr 1, 2024 21:49 UTC (Mon) by kleptog (subscriber, #1183) [Link] (1 responses)

> >That enforcement enables the agency to recover costs

> In the US at least, I don't believe any agency works this way.

It surely varies by jurisdiction, but regulatory agencies here in Netherlands don't live off fines. They'd die if that were the case. To give some examples how it works:

- NVWA (think food safety) charges per food inspection certificate issued, time spent auditing a business, etc for example.

- AFM (like the SEC) basically has a budget, which is divided by a formula over all the banks, insurance companies, etc within the Netherlands.

The principle is straight forward: regulatory authorities are paid for by the businesses they are regulating. The health agency is funded by the hospitals, GPs and pharmaceutical companies within their jurisdiction. If a sector complains the regulatory agency is too expensive, then politicians can simply argue that the sector should get its act together so they there's less enforcement work required.

It doesn't work for everything. Stuff like GDPR enforcement, it's not clear who should pay for that. But for a lot of regulatory agencies it does work reasonably well.

A backdoor in xz

Posted Apr 2, 2024 9:10 UTC (Tue) by farnz (subscriber, #17727) [Link]

The general model for things where it's not clear who should pay is for the regulator to be funded from general taxation, and for fines to go back into the general pot; it is understood that the regulator is not expected to attempt to pay its own costs via fines, but that it is expected to fine everyone who breaches the regulations.

A backdoor in xz

Posted Apr 2, 2024 18:26 UTC (Tue) by Wol (subscriber, #4433) [Link]

> If any agency *were* incentivized to levy fines, because their operating budget had to come out of the fines, this would be a perverse incentive for them to just levy fines willy-nilly. Much like the "speed traps" operated by local police agencies near the end of the month

What you *want* to achieve, is for the person paying to want to pay the minimum possible, but for them to have two (at least) different ways of minimising the cost.

My preferred example is with things like insurance companies. Why shouldn't the police have a "burglary investigation department" paid for by the insurance companies? You then hopefully get a "steady state" where the police catch enough burglars to keep the crime rate down, but barring outright fraud the system isn't going to get out of hand.

Unfortunately, capitalism tends to sabotage such neat systems, another example is the mess we have of utilities - it makes sense for the infrastructure to be owned by the customers, but all too often it's treated as a profit centre by suppliers :-( As a result you get the horror stories we of from America of people locked into cable monopolies, or stuck with dial-up speeds. In a first world state !?!?

(I won't say we're much better - in theory we're a lot better off, but it still fails horribly ...)

Cheers,
Wol

A backdoor in xz

Posted Mar 31, 2024 3:47 UTC (Sun) by rra (subscriber, #99804) [Link] (3 responses)

I agree that working out the right way to do this would be hard. Nothing about this problem is easy; if it were easy, we would have done it already. But the cracks are showing in how we're doing this now. (If only this weren't true about dozens of other things about our modern world, several of which are significantly more important than free software.)

But, that said...

> Taxes are a very blunt instrument because they completely divorce the people who use the product from the people who pay for it.

This is exactly why I find the library model interesting: there's a feedback loop. Corporate products, services, and infrastructure that use free software vote with their choices. We figure out some way to count those choices (I know, I know, complexity of the software should be a factor, how to do this is very inobvious, insert vigorous hand-waving here), and an appropriate percentage of the revenues of those companies go to the maintainers of that software. If companies stop using their software, they stop getting money. If more companies use their software, they get more money.

I personally don't like that everything in society is denominated by money (this, ironically, is part of why I write free software; I like being motivated by community rather than money), but if I want free software developers to slow down, take a breath, be more methodical, and be able to take the time to do things properly, well, most of those things require money in some way. (Not *only* money, of course.) I think we need to find ways to derisk going part-time, or taking a year between jobs to work on free software, or making a living writing free software infrastructure, if we want to get ahead of our growing maintenance crisis.

A backdoor in xz

Posted Apr 1, 2024 15:30 UTC (Mon) by kleptog (subscriber, #1183) [Link]

> This is exactly why I find the library model interesting: there's a feedback loop. Corporate products, services, and infrastructure that use free software vote with their choices.

Ok, so that's a different model. That kind of thing exists as levies for other things. Like the "thuiskopieheffing" (home copy levy) which is basically an extra charge on writable CDs/DVDs and other media which is distributed to copyright holders as compensation for the fact the people copy stuff for own use. Or the charges on appliances that pay for the collection and recycling at end-of-life.

You could, in theory, add a 1% levy on all digital products/services and then via that hand-waving you were referring to, distribute to the developers/maintainers of open-source. The justification being that all digital products/services depend on open-source anyway, this is a way for finance it. I don't think this idea is completely ridiculous, if someone could actually work out the details it could actually happen.

The details however matter. Because it's not just a money problem. Even if tomorrow there was a fund available to pay for all the maintenance of open-source software, the social structures doesn't exist to make it happen. Are there enough people who actually want to do the required work, even if they were paid? How do we ensure the work is actually done? Figuring out which projects is the easy part. Can we trust the people who actually do it?

The financing of maintenance of open-source software is a long-standing problem and simply throwing money at it isn't going to solve it. You first need to figure out *how*, then you can discuss where to get the money from. I think the CRA is a step in the development of the business models that will improve the funding situation in the future but I don't think we yet know how this will work out.

A backdoor in xz

Posted Apr 2, 2024 16:12 UTC (Tue) by GNUtoo (guest, #61279) [Link] (1 responses)

> This is exactly why I find the library model interesting: there's a feedback loop. Corporate products, services, and infrastructure that use free software vote with their choices. We figure out some way to count those choices (I know, I know, complexity of the software should be a factor, how to do this is very inobvious, insert vigorous hand-waving here), and an appropriate percentage of the revenues of those companies go to the maintainers of that software. If companies stop using their software, they stop getting money. If more companies use their software, they get more money.

The issue here is the side effects. For instance what would prevent companies from writing extremely used software with poor security track record and try to get money to fix things after the fact when the design is bad, or that even bigger foundations than the design is bad (use cases impossible to secure, etc).

And if it somehow works it could also make very secure software that go in conflict with freedom or other things we care about (like inclusiveness, making old hardware continue to work, etc).

A slightly better approach would be to look at the NLnet approach and somehow adapt it for improving security maintenance.

Micro-grants for small period of time are probably not ideal to fund long term maintenance, so that could probably be adapted/changed, along with the metrics to decide when not to pay (it's probably easier to look if a specific task is done than assert the usefulness of maintenance tasks), but the fact that highly competent people decide what to fund and not to fund and have a strategic vision for FLOSS is probably something that we need.

This could avoid the most problematic perverse incentives, and the cost here would probably be the subjectivity of the people that decide what to fund or not to fund, and here having diverse people could help but probably won't fix everything.

But at least it would be better than the other models mentioned here before.

A backdoor in xz

Posted Apr 3, 2024 13:17 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> The issue here is the side effects. For instance what would prevent companies from writing extremely used software with poor security track record and try to get money to fix things after the fact when the design is bad, or that even bigger foundations than the design is bad (use cases impossible to secure, etc).

First, to me, would be the interesting question of how a "poor security track record" ended up "extremely used" under the regulation threat looming. Besides that…if it is FOSS, who says the company gets the contract? Even if not, there could be some source escrow (something I would love to see for "critical" software). Either way, a bidding process can help with prices. Allow the "core maintainer" entity to usurp the lowest bid with, say, 10% overhead if they way to do it themselves and retain "power", but that can help curb gouging at least. Also allow bids to create a compatible replacement. Not that procurement doesn't have collusion, greased hands, and other situations, but it is at least something familiar.

A backdoor in xz

Posted Mar 31, 2024 0:17 UTC (Sun) by rgmoore (✭ supporter ✭, #75) [Link]

I largely agree with your list of suggestions, but it's not clear that they would have helped in this case. This wasn't a drive-by contribution by someone the maintainer didn't know. It sounds as if the maintainer was trying to follow your third principle by getting help. He found someone who had been contributing to the project for a couple of years before trying to slip in a backdoor. That's long enough that an overwhelmed maintainer very well could have turned over the keys to the new guy already.

This is a really serious problem that can't be solved with a few rules. There are too many projects whose value to an attacker is huge compared to their resources. Someone who really wants to subvert them can hide their true intentions for long enough to win the maintainer's trust. There's no simple way to avoid that kind of patient attack, especially for a project that barely has the resources to keep maintaining its software.

A backdoor in xz

Posted Mar 30, 2024 12:47 UTC (Sat) by lv7 (guest, #170474) [Link]

Gentoo: https://security.gentoo.org/glsa/202403-04

A backdoor in xz

Posted Mar 30, 2024 12:48 UTC (Sat) by lv7 (guest, #170474) [Link]

Lasse Collin added a notification in the takaani page: https://tukaani.org/xz-backdoor/

A backdoor in xz

Posted Mar 30, 2024 13:20 UTC (Sat) by nettings (subscriber, #429) [Link]

Hmmm. I had the compromised packages on an OpenSUSE Tumbleweed machine whose sshd was fortunately not exposed to the internet.
Stupid me hadn't even checked, the first thing I did was log onto a machine that did offer incoming sshd to the outside world.
Weird thing was, I got asked to accept that host's key, which is absolutely not plausible, because I literally log into that machine several times a week from this host. This is when I stopped the login process, killed all user processes and checked from a local text console if my local host was affected, turns out it was.

In my hurry to lock stuff down I did not write the offered key down anywhere (alas), and now I cannot obtain a compromised xz easily to reproduce... but in case someone feels bored, I'd like to know if you're seeing this, too.

A backdoor in xz

Posted Mar 30, 2024 14:39 UTC (Sat) by draco (subscriber, #1792) [Link]

IMHO, if you hate systemd and blame it for this problem without taking the time to see if anyone else has already debunked your argument, you're unwittingly helping "Jia Tan."

I very much doubt it was a primary objective, but I'm sure it is just icing on the cake for the people responsible for this backdoor that they can count on all the people who hate systemd to
waste everyone's time with their predictable knee-jerk reactions.

I've already lost count of the number of times I've seen the exact same argument both here in various subthreads of *this* conversation and on various mailing lists, etc.

Thankfully it can be quickly and thoroughly debunked, but it's still a distraction and a waste of time & effort that only serves to slow our response to this attack.

If you're one of those people, I recommend some soul searching. It's human nature to want to score easy points, but blind hate doesn't help anyone and actually gets in the way of what I thought was the point of open source — to make the world a better place.

A backdoor in xz

Posted Mar 30, 2024 19:19 UTC (Sat) by gdamjan (subscriber, #33634) [Link]

OpenWrt official Project Statement
https://forum.openwrt.org/t/project-statement-about-xz-5-...

tl;dr; it's fine, but still

A backdoor in xz

Posted Mar 30, 2024 19:57 UTC (Sat) by marcin (subscriber, #159076) [Link]

For Fedora 40 and Rawhide users (at least): remember to also regenerate your initramfs (with dracut -f) if the following command shows "liblzma.so.5.6.0":

cd /boot; for i in initramfs-*; do echo $i:; lsinitrd $i | grep liblzma; done

Obtaining a copy of the bad library

Posted Mar 30, 2024 23:08 UTC (Sat) by tarvin (guest, #4412) [Link] (2 responses)

Does someone know how to get hold of a liblzma.so.5 which has the backdoor code? I'd like to have some so-called antimalware products scan it and see if they detect anything bad with it. (My prejudice is that they will not flag it as bad, but maybe I'm wrong.)

Obtaining a copy of the bad library

Posted Mar 31, 2024 6:41 UTC (Sun) by wsy (subscriber, #121706) [Link] (1 responses)

https://snapshot.debian.org/archive/debian/20240328T02565...

Obtaining a copy of the bad library

Posted Mar 31, 2024 11:26 UTC (Sun) by tarvin (guest, #4412) [Link]

Thanks. At the time of writing at VirusTotal, just three out of 76 so-called malware detection products flag an issue with the bad shared object file. Those three products are AliCloud, DrWeb, and Rising. None of the major antimalware tool vendors detect it.

This highlights the absurdity of some corporations' policies requiring that antivirus (which have many times itself suffered from security holes) be installed even on Linux systems.

Building a backdoored Kernel - Attack vector 2?

Posted Mar 31, 2024 16:32 UTC (Sun) by ma4ris8 (subscriber, #170509) [Link] (3 responses)

My sympathy to Lasse Collin. I think that he didn't have any idea what was happening.
I tried to understand, what "Jia Tan" tried to do.

Below is hypothesis level, thus might have some truth, but how much,
that is to be seen:

It could be, that the LKML patch's intent was to: Install a backdoor into Linux Kernel via
"xz" command injecting the payload during Kernel Build process.

I compared LKML patch, and the first related Git commits since 2022.
I also read the "oss-security" email history of this issue.

When looking at commits which mention Jia Tan at https://git.rootprojects.org/root/xz
first logic change with weird wording is

Jia Tan <jiat0218@gmail.com> 2022-09-21 14:28:53
3d5a99ca "liblzma: Fix copying of check type statistics in lzma_index_cat()."
Thoughts: "Last stream handling": Perhaps after the full stream, there is the backdoor payload stream
still waiting.

Jia Tan <jiat0218@gmail.com> 2022-10-05 11:41:38
fae37ad2affd8 "Tests: Fix compilation error when threading support has been disabled."
+#ifndef MYTHREAD_ENABLED
+ assert_skip("Threading support disabed");
+#else

Thus Jia Tan disables test, if the code is running in a single threaded case, for memlimit.
Thus it eases to allocate memory for the additional payload.

1fc6e7dd1fa 2022-11-07
"liblzma: Include cached memory in reported memusage in threaded decoder. Thanks to Jia Tan."
This code caches memory statistics. That could be useful to hide additional memory use
from payload management within xz.

"oss-security" mentioned that running "$XZ" so that it prints environment variables:
that could inject some code in the LKML patch, that could affect Kernel build.
+eval "$($XZ --robot --version)" || exit

LKML patch has advice:
"+ xz --threads=1 --check=crc32 --lzma2=dict=512KiB inputfile"
Here will possibly the untested (tests skipped) code paths run, and thus there could be additional payload added from
within "xz" binary, and unseen code paths. If this recommended command is executed, then the back door could be installed
into the Kernel during build process (if the feature has been activated within some version of backdoor version of xz).

Building a backdoored Kernel - Attack vector 2?

Posted Mar 31, 2024 17:16 UTC (Sun) by nix (subscriber, #2304) [Link] (2 responses)

1fc6e7dd1fa 2022-11-07 "liblzma: Include cached memory in reported memusage in threaded decoder. Thanks to Jia Tan." This code caches memory statistics. That could be useful to hide additional memory use from payload management within xz.

The commit with that SHA-1 ID is

xz: Avoid a compiler warning in progress_speed() in message.c.

(The commit you mean is 5e2450c75c.)

Your description is wrong, as far as I can see: it does the opposite, *including* cached usage in the memory limit (which previously excluded it, so memory could exceed the limit arbitrarily much). There is no attempt to cache anything new.

Further, this commit is not in the upstream kernel at all, nor in anything that has been submitted to the kernel as far as I can see (unsurprisingly, given that the kernel's xz decoder is *much* simpler than liblzma's and isn't even from the same project, but rather from XZ Embedded, which has barely been touched for years: its last non-Itanium-removal commit predates Jia Tan's involvement, thank goodness).

Building a backdoored Kernel - Attack vector 2?

Posted Mar 31, 2024 19:21 UTC (Sun) by ma4ris8 (subscriber, #170509) [Link] (1 responses)

You are right : the hash was wrong one for the commit in "xz source code", outside of Kernel source code.
Sorry for the mistake.

Sometimes I have ideas, which prove to be false.
Maybe I'm just trying to think too much, with too little information, what the backdoor code creator attempted to do.

It could be that the Kernel changes in the merge request were not enough to enable the backdoor,
only the sshd side was completed. Also the target Kernel build process, could be something specific (for example specific distribution build environment, which would use "xz" RPM, not one from Kernel, for example).

It is interesting, that between 5.6.0 and 5.6.1 there were "ifunc" changes:
libarchive change was also related with "ifunc".

https://gist.github.com/martenson/398bdb7a928069cf67606c9...
"We're reasonably sure the following things need to be true for your system to be vulnerable:
You need to be running a distro that uses glibc (for IFUNC)
etc.
It may activate in other scenarios too, possibly even unrelated to ssh.
We don't know what the payload is intended to do. We are investigating."

Building a backdoored Kernel - Attack vector 2?

Posted Mar 31, 2024 19:46 UTC (Sun) by nix (subscriber, #2304) [Link]

Yeah, the IFUNC mechanism was abused to force different resolution for symbols in libcrypto (!) as used by openssl. It may be possible to spot and block this abuse, since it seems to me that no legitimate program would ever want to do what the exploit does, but let's not fool ourselves -- if this wasn't present, the exploit would just have done something else. By the time you have hostile code executing in the same address space as sshd before privsep has kicked in, you've lost, IFUNC or no IFUNC.

A backdoor in xz

Posted Apr 1, 2024 0:35 UTC (Mon) by Heretic_Blacksheep (subscriber, #169992) [Link]

Further analysis posted to the mailing list after this short blurb was released to LWN suggest this isn't a authentication bypass backdoor. It's in reality an RCE back door triggered by sending a particular security certificate to the compromised system's sshd.

A resulting bug report was sent to OpenSSH to close the potentially unwanted authentication vector that makes the RCE possible. https://bugzilla.mindrot.org/show_bug.cgi?id=3675

It seems as if OpenSSH will accept a security cert regardless of whether a CA is configured or not.

Otherwise, it seems this backdoor in xz was still limited to certain kinds of Linux environments and was (accidentally) caught before it could do widespread damage. Where FOSS project security goes from here will largely be up to the individuals and individual organizations that manage them. Hopefully, it will lead to greater scrutiny of individual project governance's standards (like how thoroughly they have been reviewing commits) before inclusion into others (like Linux distro repositories).

A backdoor in xz

Posted Apr 1, 2024 1:39 UTC (Mon) by sergey.senozhatsky (subscriber, #91933) [Link]

This actor is also a co-maintainer ("XZ EMBEDDED") of the Linux kernel xz implementation. And they have some contribution record to the Linux kernel.

libarchive "bsdtar" tar extraction exploit

Posted Apr 1, 2024 6:43 UTC (Mon) by ma4ris8 (subscriber, #170509) [Link]

JFrog shows "bsdtar" proof of concept of the "libarchive" modification
at https://jfrog.com/blog/xz-backdoor-attack-cve-2024-3094-a...

JFrog states:
"In 2021, JiaT75 submitted a pull request to the libarchive repository with the title ‘Added error text to warning when untaring with bsdtar’ which seemed legitimate at first glance. "

A backdoor in xz

Posted Apr 1, 2024 18:56 UTC (Mon) by ccchips (subscriber, #3222) [Link]

In case anyone is interested....

https://www.politico.com/news/2024/03/31/thwarted-supply-...

Automatically auditing tarballs?

Posted Apr 2, 2024 12:01 UTC (Tue) by GNUtoo (guest, #61279) [Link] (1 responses)

Some of the source code backdooring attempts I know (Linux backdoor from November 2003, ProFTPD and now XZ) that don't look like mistakes (these are not memory safety bugs that are easy to make) were not done in the version control system but instead directly on tarballs.

Using the version control system directly in packages is probably not the answer though as this is not always a drop-in replacement for tarballs.

For instance many git repositories don't sign commits while tarballs are often signed, and if not, the checksum of a tarball is something relatively simple that give very strong guarantees against hosters somewhat subverting the tarball and making it pass the check.

As I understand git also has issues, for instance enabling enabling fsckObjects is not always practical as many projects (including Linux) require skiplists. And if I understood right, without that the hoster of the repository can somewhat subvert the commits in a way that still pass all the integrity of the checks.

Guix has tooling to create tarballs from git (see "Supporting long-term reproducibility" in https://guix.gnu.org/en/blog/2022/gnu-guix-1.4.0-released/) however I'm unsure if it's also used to somewhat automatically audit tarballs, but It can at least be re-used to least detect if a given git commit correspond to a given tarball with some code as the functionality to do that is there.

If this is also used for security somehow, to catch cases where the tarball was equal to the git release + some metadata and started having extra code, and/or if we add such functionality to most distributions, we could at least catch regressions and new issues and make backdooring (some) tarballs more difficult.

As for cases where the tarball differs, maybe a similar approach could work too: warnings about the extra files or a diff could be reported to the packagers and enable the packagers to fix that by either ignoring the warning or by removing the differences somehow (rm -f configure [...] && autoreconf -vfi && [...]) .

This is probably does not work well for all distributions as some have very simplistic build systems[1] that don't allow to modify the way all autotools builds or git fetch are done in a single place, but it's not necessarily a big issue as doing that in the other distributions still increase the probability of catching issues and avoiding it for the ones with simplistic build systems.

[1] Like every approach, they have advantages and disadvantages. One of the advantage of simplistic build systems is that making packages with these is way easier, and so it enables more people to contribute and/or to be empowered to package what they need, modify packages, etc. The disadvantage is that adding an option to most packages, or reusing the package information globally is time consuming.

Automatically auditing tarballs?

Posted Apr 3, 2024 7:58 UTC (Wed) by smurf (subscriber, #17840) [Link]

> many git repositories don't sign commits

You don't need to sign commits. You need to sign tags. Tagging a new version doesn't happen automatically (usually); it's the maintainer's job, not the repo's.

A backdoor in xz

Posted Apr 5, 2024 8:55 UTC (Fri) by dvandeun (guest, #24273) [Link]

The general public is being informed. It's nice that The Economist has a rather clueful article on this in its Leaders section.