Bcachefs Goes to "Externally Maintained"

Posted4 months agoActive4 months ago

ksec

227 points

383 comments

lwn.netTech DiscussionstoryHigh profile

informativeneutral

Debate

20/100

LinuxBcachefsFilesystem Maintenance

Key topics

Linux

Bcachefs

Filesystem Maintenance

The Linux kernel's bcachefs file system has been relegated to "Externally Maintained" status, sparking debate about the implications for users and developers. Commenters weighed in on the potential consequences, with some lamenting the loss of a kernel-maintained file system and others pointing out that DKMS (Dynamic Kernel Module Support) isn't the only way to distribute out-of-tree modules. A consensus emerged that the change might be more about managing the mercurial creator of bcachefs, Kent, and his sometimes prickly interactions with other kernel developers, rather than any technical issues with the file system itself. As one commenter noted, Kent's history of "drama" often involves clashing with rigid workflows and environments, suggesting that his approach may be at the heart of the issue.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

123

Day 1

Avg / period

22.9

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Aug 30, 2025 at 9:07 AM EDT
4 months ago
Step 01
02First comment
Aug 30, 2025 at 1:08 PM EDT
4h after posting
Step 02
03Peak activity
123 comments in Day 1
Hottest window of the conversation
Step 03
04Latest activity
Sep 13, 2025 at 2:01 AM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (383 comments)

Showing 160 comments of 383

Volundr

4 months ago

3 replies

Damn. I was enjoying not having to deal with the fun of ZFS and DKMS, but it seems like now bcachefs will be in the same boat, either dealing with DKMS and occasional breakage or sticking with the kernel version that slowly gets more and more out of date.

mustache_kimono

4 months ago

2 replies

> Damn. I was enjoying not having to deal with the fun of ZFS and DKMS, but it seems like now bcachefs will be in the same boat, either dealing with DKMS and occasional breakage or sticking with the kernel version that slowly gets more and more out of date.

Your distro could very easily include bcachefs if it wishes? Although I think the ZFS + Linux situation is mostly Linux religiosity gone wild, that very particular problem doesn't exist re: bcachefs?

The problem with bcachefs is the problem with btrfs. It mostly still doesn't work to solve the problems ZFS already solves.

kstrauser

4 months ago

1 reply

> Although I think the ZFS + Linux situation is mostly Linux religiosity gone wild

I can think of non-religious reasons to want to avoid legal fights with Oracle.

mustache_kimono

4 months ago

>> I can think of non-religious reasons to want to avoid legal fights with Oracle.

Oh, certainly, but that's really not the problem posed right now. Oracle's own lawyers have said they see no problem with the combination, and Ubuntu has shipped the Linux + ZFS combination for years without a lawsuit.[0][1]

In the 1990s, Microsoft/SCO, like you, would also fear monger about open source and lawsuits, and we in the OSS community mostly called this "FUD" (fear, uncertainty, and doubt). Whereas now, almost 10 years into this experiment, we know more about the ZFS + Linux combination, and what Oracle will do about it, than most other open questions in OSS, and the answer is that some in OSS community have chosen to instead participate in the same kind of FUD because of a very online, very uninformed internecine licensing debate.

[0]: https://www.youtube.com/watch?v=PFMPjt_RgXA&t=2260s [1]: https://ubuntu.com/blog/zfs-is-the-fs-for-containers-in-ubun...

jchw

4 months ago

> Although I think the ZFS + Linux situation is mostly Linux religiosity gone wild,

I think the Linux Kernel just doesn't want to be potentially in violation of Oracle's copyrights. That really doesn't seem that unreasonable to me, even if it feels pointless to you.

WD-42

4 months ago

2 replies

The article says that bcachefs is not being removed from the mainline kernel. This looks like mostly a workaround for Linus and other kernel devs to not have to deal with Kent directly.

ffsm8

4 months ago

2 replies

The three listed options in the OP thread were

* Another kernel Dev takes over management and they tread it as a fork (highly unlikely according to their estimate)

* Kent hires someone to upstream the changes for him and Kent stops complaining wrt when it's getting merged

* Bcachefs gets no maintenance and will likely be removed in the next major release

I do not know him personally, but most interactions I've read online by him sounded grounded and not particularly offensive, so I'm abstaining from making any kind of judgement on it.

But while I have no stake in this, Drama really does seem to follow Kent around for one reason or another. And it's never his fault if you take him by his public statements - which I want to repeat: he sounds very grounded and not offensive to me whatsoever.

BoredPositron

4 months ago

1 reply

Grounded? Not offensive?

https://lore.kernel.org/lkml/CAHk-=wiLE9BkSiq8F-mFW5NOtPzYrt...

https://lore.kernel.org/all/citv2v6f33hoidq75xd2spaqxf7nl5wb...

ffsm8

4 months ago

1 reply

The first one is by Linus? And his replies (at least the ones I read) are - to me- less aggressive then the rest of the mails in that chain

The second has one offensive remark:

> Get your head examined. And get the fuck out of here with this shit.

which I thought he admitted was out of line and - said sorry for. Or do I misremember? I admit once again, I'm still completely uninvolved and merely saw it play out on the internet.

yencabulator

4 months ago

If you read his replies downthread from that, Kent seems to be going through a lot of effort to not apologize, in any form, and prefers talking about how other people were mean to him.

I had high hopes for bcachefs. sigh

sarlalian

4 months ago

If you look at all the places where Kent has had drama, the common element is him and environments that have pretty rigid workflows. The common thread seems to be him not respecting workflows and processes that those places have, that inconvenience his goals. So, he ignores the workflows and processes of those places, and creates a constant state of friction and papercuts for those who he needs to accomplish his goals. They eventually get fed up, and either say no, not working with you anymore, or no, you’re not welcome to contribute here anymore.

He’s not super offensive, but he will tell a Debian package maintainer that their process sucks, and the should change it and they are being stupid by following that process. Overall, he seems a bit entitled, and unwilling to compromise with others. It’s not just Kent though, the areas that seem to be the most problematic for him, are when it’s an unstoppable force (Kent), and an immovable wall (Linux / Debian).

Working in the Linux kernel is well known for its frustrations and the personal conflict that it creates, to the point that there are almost no linux kernel devs/maintainers that aren’t paid to do the work. You can see a similar set of events happen with Rust4Linux people, Asahi linux project and their R4L drivers, etc.

tux3

4 months ago

1 reply

It's complicated, no one really knows what "externally maintained" entails at the moment. Linus is not exactly poised to pull directly from Kent, and there is no solution lined-up at the moment.

Both Linus and Kent drive a hard bargain, and it's not as simple as finding someone else to blindly forward bcachefs patches. At the first sign of conflict, the poor person in the middle would have no power, no way to make anyone back down, and we'd be back to square one.

It's in limbo, and there is still time, but if left to bitrot it will be removed eventually.

immibis

4 months ago

1 reply

That person would be accountable to Linus, but not to Kent.

tux3

4 months ago

1 reply

Unfortunately, there's also nothing they can do if Kent says no. Say there's a disagreement on a patch that touches something outside fs/bcachefs, that person can't exactly write their own patches incorporating the feedback. They're not going to fork and maintain their own patches. They'd be stuck between a rock and a hard place, and that gets us back to a deadlock.

The issue is that I have never seen Kent back down a single time. Kent will explain in details why the rules are bullshit and don't apply in this particular case, every single time, without any room for compromise.

If the only problem was when to send patches, that would be one thing. But disagreements over patches aren't just a timing problem that can be routed around.

koverstreet

4 months ago

2 replies

The key thing here is I've never challenged Linus's authority on patches outside fs/bcachefs/; I've quietly respun pull requests for that, on more than one occasion.

The point of contention here was a patch within fs/bcachefs/, which was repair code to make sure users didn't lose data.

If we can't have clear boundaries and delineations of responsibility, there really is no future for bcachefs in the kernel; my core mission is a rock solid commitment to reliability and robustness, including being responsive to issues users hit, and we've seen repeatedly that the kernel process does not share those priorities.

tux3

4 months ago

1 reply

You may be right, but I think looking at it from a lens of who has authority and can impose their decision is still illustrating the point I'm trying to make.

To some extent drawing clear boundaries is good as a last resort when people cannot agree, but it can't be the main way to resolve disagreements. Thinking in terms of who owns what and has the final say is not the same as trying to understand the requirements from the other side to find a solution that works for everyone.

I don't think the right answer is to blindly follow whatever Linus or other people say. I don't mean you should automatically back down without technical reasons, because authority says so. But I notice I can't remember an email where concessions where made, or attemps to find a middle grounds by understanding the other side. Maybe someone can find counterexamples.

But this idea of using ownership to decide who has more authority and can impose their vision, that can't be the only way to collaborate. It really is uncompromising.

koverstreet

4 months ago

2 replies

> To some extent drawing clear boundaries is good as a last resort when people cannot agree, but it can't be the main way to resolve disagreements. Thinking in terms of who owns what and has the final say is not the same as trying to understand the requirements from the other side to find a solution that works for everyone.

Agreed 100%. In an ideal world, we'd be sitting down together, figuring out what our shared priorities are, and working from there.

Unfortunately, that hasn't been possible, and I have no idea what Linus's priorities except that they definitely aren't a bulletproof filesystem and safeguarding user data; his response to journal_rewind demonstrated that quite definitively.

So that's where we're at, and given the history with other local filesystems I think I have good reason not to concede. I don't want to see bcachefs run off the rails, but given all the times I've talked about process and the way I'm doing things I think that's exactly what would happen if I started conceding on these points. It's my life's work, after all.

You'd think bcachefs's track record (e.g. bug tracker, syzbot) and the response it gets from users would be enough, but apparently not, sadly. But given the way the kernel burns people out and outright ejects them, not too surprising.

magicalhippo

4 months ago

1 reply

> Unfortunately, that hasn't been possible, and I have no idea what Linus's priorities except that they definitely aren't a bulletproof filesystem and safeguarding user data

Remarks like this come across as extremely patronizing, as you completely ignore what the other party says and instead project your own conclusions about the other persons motives and beliefs.

> his response to journal_rewind demonstrated that quite definitively

No, no it did not in any shape way or form do that. You had multiple other perfectly valid options to help the affected users besides getting that code shipped in the kernel there and then. Getting it shipped in the kernel was merely a convenience.

If bcachefs was established and stable it would be a different matter. But it's an experimental file system. Per definition data loss is to be expected, even if recovery is preferable.

koverstreet

4 months ago

No, bcachefs-tools wasn't an option because the right way to do this kind of repair is to first do a dry run test repair and mount, so you can verify with your eyes that everything is back as it should be.

If we had the fuse driver done that would have worked, though. Still not completely ideal because we're at the mercy of distros to make sure they're getting -tools updates out in a timely manner, they're not always as consistent with that as the kernel. Most are good, though).

Just making it available in a git repo was not an option because lots of bcachefs users are getting it from their distro kernel and have never built a kernel before (yes, I've had to help users with building kernels for the first time; it's slow and we always look for other options), and even if you know how, if your primary machine is offline the last thing you want to have to do is build a custom rescue image with a custom kernel.

And there was really nothing special about this than any other bugfix, besides needing to use a new option (which is also something that occasionally happens with hotfixes).

Bugs are just a fact of life, every filesystem has bugs and occasionally has to get hotfixes out quickly. It's just not remotely feasible or sane to be coming up with our own parallel release process for hotfixes.

tux3

4 months ago

Fair enough. As someone who has lost filesystems to bugs and files to corrupted blocks, I definitely appreciate the work you've done on repair and reliability.

I think there's room to have your cake and eat it too, but I certainly can't blame you for caring about quality, that much is sure.

tuna74

4 months ago

1 reply

Linus T is responsible for everything in Linux, it is his project and he is the maintainer. He can do everything he wants in his branch and people just have to accept it. If you want to be responsible you have to fork Linux.

koverstreet

4 months ago

1 reply

Let's examine this, shall we?

Has he ever even been involved with a bcachefs bug? No, aside from arguing against shipping bugfixes.

Has he contributed in any way, besides merging code? No...

Has he set rules or guidelines that benefited bcachefs reliability? No, but he has shouted down talk about automated testing.

I think you're confusing power with responsibility.

immibis

4 months ago

1 reply

You're still doing that thing where you assume everyone else is you. Linus's job (some of which he delegates) is to take the contributions from ALL the hundreds of maintainers, and bundle it into a unified coherent whole. He is not only responsible for bcachefs reliability. In the train analogy I already used, he is the train driver, he is responsible for getting everyone who is on the train to their destination, but he is not responsible for ensuring that you're on the train. It's your responsibility to ensure that you're on the train when it departs.

You are one of those maintainers (not any more). Your code can be taken into the bundle (not any more), but on the bundle's schedule, not yours. You have consistently failed to understand that the train doesn't wait for you - if you are late, you get on the next one. If you don't want to get on the next one, then don't be late. Normal people, after missing a train once or twice, would adjust their schedule accordingly so they won't miss it next time. But your exclusive, repeated reaction has been to yell at the train driver and the station master, which is why you've been kicked out of the station.

Have you ever ridden a train, by the way? Were you on time? (Deutsche Bahn doesn't count because they're not on time)

koverstreet

4 months ago

We're not talking about situations where the bcachefs changes could plausibly affect the rest of the kernel, and I am well known even within the kernel community for being on top of potential issues and responsive on bugs.

NewJazz

4 months ago

FWIW DKMS is not the only way to distribute out of tree modules.

https://github.com/chimera-linux/ckms

LeoPanthera

4 months ago

1 reply

It's orphaned in Debian as well, but I'm not sure what significant advantages it has over btrfs, which is very stable these days.

betaby

4 months ago

5 replies

btrfs was unusable in multi disk setup for kernels 6.1 and older. Didn't try since then. How's stable btrs today in such setups?

Also see https://www.phoronix.com/news/Josef-Bacik-Leaves-Meta

LeoPanthera

4 months ago

5 replies

It's sort of frustrating that this constantly comes up. It's true that btrfs does have issues with RAID-5 and RAID-6 configurations, but this is frequently used (not necessarily by you) as some kind of gotcha as to why you shouldn't use it at all. That's insane. I promise that disk spanning issues won't affect your use of it on your tiny ThinkPad SSD.

It's important to note that striping and mirroring works just fine. It's only the 5/6 modes that are unstable: https://btrfs.readthedocs.io/en/stable/Status.html#block-gro...

rendaw

4 months ago

1 reply

> on your tiny ThinkPad SSD

Ad hominem. My thinkpad ssd is massive.

LeoPanthera

4 months ago

Good news, it will work just fine on that too.

AaronFriel

4 months ago

1 reply

Respectfully to the maintainers:

How can this be a stable filesystem if parity is unstable and risks data loss?

How has this been allowed to happen?

It just seems so profoundly unserious to me.

wtallis

4 months ago

2 replies

Does the whole filesystem need to be marked as unstable if it has a single experimental feature? Is any other filesystem held to that standard?

AaronFriel

4 months ago

1 reply

Parity support in multi-disk arrays is older than I am, it's a fairly standard feature. btrfs doesn't support this without data loss risks after 17 years of development.

wtallis

4 months ago

1 reply

If you're not interested in a multi-disk storage system that doesn't have (stable, non-experimental) parity modes, that's a valid personal preference but not at all a justification for the position that the rest of the features cannot be stable and that the project as a whole cannot be taken seriously by anyone.

AaronFriel

4 months ago

Is that what I said?

nextaccountic

4 months ago

Maybe this specific feature should be marked as unstable and default to disabled on most kernel builds unless you add something like btrfs.experimental=1 to the kernel line or something

betaby

4 months ago

1 reply

But RAID-6 is the closest approximation to raid-z2 from ZFS! And raid-z2 is stable for a decade+. Indeed btrfs works just fine on my laptop. My point is that Linux lacks ZFS-like fs for large multi disc setups.

NewJazz

4 months ago

1 reply

Seriously for the people who take filesystems seriously and have strong preferences... Multi disk might be important.

wtallis

4 months ago

1 reply

BTRFS does have stable, usable multi-disk support. The RAID 0, 1, and 10 modes are fine. I've been using BTRFS RAID1 for over a decade and across numerous disk failures. It's by far the best solution for building a durable array on my home server stuffed full of a random assortment of disks—ZFS will never have the flexibility to be useful with mismatched capacities like this. It's only the parity RAID modes that BTRFS lacks, and that's a real disadvantage but is hardly the whole story.

Filligree

4 months ago

1 reply

That’s nice and all, but I have five disks in my server. I want the 6 mode.

In practice RAIDZ2 works great.

wtallis

4 months ago

1 reply

In the case of five disks of the same capacity, RAID6 or RAIDZ2 only gets you 20% more capacity than btrfs RAID1. That's not exactly a huge disparity, usually not enough to be a show-stopper on its own. There are plenty of scenarios where the features ZFS has which btrfs lacks are more important than the features that btrfs has which ZFS lacks. My point is simply that btrfs RAID1 has its uses and shouldn't be dismissed out of hand.

Trixter

4 months ago

This is missing one of the finer points of redundancy: With 4 disks, losing two of them could take out a RAID-10 (both halves of a stripe component), but losing two cannot take out a RAID-6.

The fact that btrfs isn't stable after 15+ years for parity setups is, IMO, unreasonable.

__turbobrew__

4 months ago

2 replies

How can I know what configurations of btrfs lose my data?

I also have had to deal with thousands of nodes kernel panicing due to a btrfs bug in linux kernel 6.8 (stable ubuntu release).

ffsm8

4 months ago

1 reply

I thought the usual recommendation was to use mdadm to build the disk pool and then use btrfs on top of that - but that might be out of date. I haven't used it in a while

necheffa

4 months ago

This is very much a big compromise where you decide for yourself that storage capacity and maybe throughput are more important than anything else.

The md metadata is not adequately protected. Btrfs checksums can tell you when a file has gone bad but not self-heal. And I'm sure there are going to be caching/perf benefits left on the table not having btrfs manage all the block storage itself.

mook

4 months ago

1 reply

I thought most distros have basically disabled the footgun modes at this point; that is, using the configuration that would lose data means you'd need to work hard to get there (at which point you should have been able to see all the warnings about data loss).

__turbobrew__

4 months ago

See the part of my comment where the btrfs kernel driver paniced on Ubuntu 24 stable kernel.

We are using a fairly simple config, but under certain heavy load patterns the kernel would panic: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...

I hear people say all the time how btrfs is stable now and people are just complaining about issues when btrfs is new, but please explain to me how the bug I linked is OK in a stable version of the most popular linux distro?

risho

4 months ago

1 reply

as it turns out raid 5 and 6 being broken is kind of a big deal for people. its also far from ideal that the filesystem has random landmines that you can accidentally step on if you don't happen to read hacker news every day.

jorams

4 months ago

FWIW: RAID 5 and 6 having problems is not a random hole you'll accidentally stumble into.

The man page for mkfs.btrfs says:

> Warning: RAID5/6 has known problems and should not be used in production.

When you actually tell it to use raid5 or raid6, mkfs.btrfs will also print a large warning:

> WARNING: RAID5/6 support has known problems is strongly discouraged to be used besides testing or evaluation.

cmurf

4 months ago

1 reply

Absurd to claim it’s unusable without any qualification whatsoever.

Single, dup, raid0, raid1, raid10 have been usable and stable for a decade or more.

bigstrat2003

4 months ago

2 replies

I lost my BTRFS RAID-1 array a year or two ago when one of my drives went offline. Just poof, data gone and I had to rebuild. I am not saying that it happens all the time, but I wouldn't say it's completely bulletproof either.

cmurf

4 months ago

There's 100s of possible explanations but, without any evidence at all, you've selected one. It's simply not a compelling story.

Disaster recovery isn't obvious on any setup I've worked with. I have to RTFM to understand each system's idiosyncrasies.

The idea that some filesystems have no bugs is absurd. The idea that filesystems can mitigate all bugs in drive firmware or elsewhere in the storage stack is also absurd.

My anecdota: hundreds of intentional sabotaging of Btrfs while writing, single drive and raid1 configurations, and physically disconnecting a drive. Not one time have I encountered an inconsistent filesystem or data loss once it was on stable media. Not one. It always mounted without needing a filesystem check. This is on consumer hardware.

There's always some data loss in the single drive case no matter the filesystem. Some of the data or fs metadata isn't yet persistent. Raid1 helps with that, because so long as the hardware problem that affected the 1st drive is isolated, the data is written to stable media.

Of course, I'm no more a scientific sample than you are. And also my tests are rudimentary compared to the many thousands of synthetic tests fstests performs on Linux filesystems, both generic and fs specific, every cycle. But it is a real world test, and suggests no per se problem that inevitably means data loss as you describe.

thoroughburro

4 months ago

What did you try before giving up?

All the anecdotes I see tend to be “my drive didn’t mount, and I tried nothing before giving up because everyone knows BTRFS sux lol”. My professional experience meanwhile is that I’ve never once been able to not (very easily!) recover a BTRFS drive someone else has given up for dead… just by running its standard recovery tools.

deknos

4 months ago

1 reply

i run btrfs on servers and desktops. it's usuable.

williamstein

4 months ago

1 reply

So do I and BTRFS is extremely good these days. It's also much faster than ZFS at mounting a disk with a large number of filesystems (=subvolumes), which is critical for building certain types of fileservers at scale. In contrast, ZFS scales horribly as the number of filesystems increases, where btrfs seems to be O(1). btrfs's quota functionality is also much better than it used to be (and very flexible), after all the work Meta put into it. Finally, having the option of easy writable snapshots is nice. BTRFS is fantastic!

yjftsjthsd-h

4 months ago

2 replies

> It's also much faster than ZFS at mounting a disk with a large number of filesystems (=subvolumes), which is critical for building certain types of fileservers at scale.

Now you've piqued my curiosity; what uses that many filesystems/subvolumes? (Not an attack; I believe you, I'm just trying to figure out where it comes up)

williamstein

4 months ago

2 replies

It can be useful to create a file server with one filesystem/subvolume per user, because each user has their own isolated snapshots, backups via send/recv are user-specific, quotas are easier, etc. If you only have a few hundred users, ZFS is fine. But what if you have 100,000 users? Then just doing "zpool import" would take hours, whereas mounting a btrfs filesystem with 100,000 subvolumes takes a seconds. This complexity difference was a show stopper for me to architect a certain solution on top of ZFS, despite me personally loving ZFS and having used it for a long time. The btrfs commands and UX are really awkward (for me) compared to ZFS, but btrfs is extremely efficient at some things where ZFS just falls down.

The main criticism in this thread about btrfs involves multidisk setups, which aren't relevant for me, since I'm working on cloud systems and disk storage is abstracted away as a single block device.

williamstein

4 months ago

Incidentally, the application I'm reworking to use btrfs is cocalc.com. One of our main use cases is distributed assignments to students in classes, as part of the course management functionality. Imagine a class with 1500 students all getting an exact copy of a 50 MB folder, which they'll edit a little bit, and then it will be collected. The copy-on-write functionality of btrfs is fantastic for this use case (both in speed and disk usage).

Also, the out-of-band deduplication for btrfs using https://github.com/Zygo/bees is very impressive and flexible, in a way that ZFS just doesn't match.

magicalhippo

4 months ago

I seem to recall some discussion in one of the OpenZFS leadership meetings about slow pool imports when you have many datasets. Sadly I can't recall the details, but at least it seems to be on their radar.

yencabulator

4 months ago

As far as I understand, a core use case at Meta was build system workers starting with prepopulated state and being able to quickly discard the working tree at the end of the build. CoW is pretty sweet for that.

procaryote

4 months ago

If you don't trust btrfs raid it's perfectly possible to run btrfs on top of lvm or mdadm raid. Then you have btrfs in a prety happy case single device mode. Also the recovery tooling is more well known and tested

turtletontine

4 months ago

I’ve been running btrfs on a little home Debian NAS for over a year now. I have no complaints - it’s been working smoothly, doing exactly what I want. I have a heterogeneous set of probably 6 discs, >20TB total, no problems.

*caveat: I’m using RAID 10, not a parity RAID. It could have problems with parity RAID. So? If you really really want RAID 5, then just use md to make your RAID 5 device and put btrfs on top.

omoikane

4 months ago

1 reply

Related: Linux CoC Announces Decision Wrt Kent Overstreet (Bcachefs) (kernel.org)

https://news.ycombinator.com/item?id=42221564 - 2024-11-23, 103 comments

wging

4 months ago

There's been much more recent friction between various parties, so I don't think this most recent news is a direct result of that decision. See for instance https://news.ycombinator.com/item?id=44464396

sevg

4 months ago

5 replies

Is it just me or does Kent seem self-destructively glued to his own idea of how kernel development should work?

I don’t doubt that people on all sides have made mis-steps, but from the outside it mostly just seems like Kent doesn’t want to play by the rules (despite having been given years of patience).

ajb

4 months ago

4 replies

I think Kent is in the wrong here, but it really doesn't help that the kernel people from Linus on down are seemingly unable to explain the problem, and instead resort to playground insults. Apart from being unprofessional and making for a hostile work environment, it doesn't really communicate why Kent's actions are problematic, so I've some sympathy for his not believing that they are.

sevg

4 months ago

2 replies

> it doesn't really communicate why Kent's actions are problematic

I agree that the kernel community can be a hostile environment.

Though I’d argue that people _have_ tried to explain things to Kent, multiple times. At least a few have been calm, respectful attempts.

Sadly, Kent responds to everything in an email except the key part that is being pointed out to him (usually his behavior). Or deflects by going on the attack. And generally refuses to apologise.

philipallstar

4 months ago

2 replies

> Sadly, Kent responds to everything in an email except the key part that is being pointed out to him (usually his behavior).

Behaviour sounds like the least important part of code contributions. I smell overpowered, should've-been-a-kindergarten-teacher code of conduct person overreach.

dralley

4 months ago

1 reply

No. As someone who likes bcachefs and even literally donates to Kent's patreon, the way he has gone about engaging with the kernel community is not productive. Unfortunately.

CoC isn't even the issue, he constantly breaks kernel development rules relating to the actual code, then starts arguments with everyone up to and including Linus when he gets called out, and aggressively misses the point every time. Then starts the same argument all over again 6 weeks later.

And, like, if you don't like some rules, then you can have that discussion, but submitting patches you know will be rejected and then re-litigating your dislike of the rules is a waste of everyone's time.

philipallstar

4 months ago

1 reply

I think it is partly about code of conduct issues[0]. I totally agree that Linus can run whatever release process he likes, and Overbeck should get in line with that. However all of the accompanying sighing at how many times we've had to explain things to him from others is not okay. So what if more discussion is needed or wanted? People doing difficult work might have strong opinions. People doing easy work (e.g. sending code of conduct emails) should not have an equal weight to their opinions, if any at all.

[0] https://lore.kernel.org/lkml/6740fc3aabec0_5eb129497@dwillia...

ajb

4 months ago

1 reply

Interesting mashup there of Kent Beck and Kent Overstreet :-)

philipallstar

4 months ago

Sorry! Yes. Agile filesystems incoming.

jeltz

4 months ago

1 reply

No, Kent has generally had a nice tone. The issue is that he has repeatedly violated the rules about code contributions. For example by including new features together with several bug-fixes during rc. That is not a CoC issue, it is not respecting the rules of patch submission and not respecting the time of the kernel maintainers.

philipallstar

4 months ago

I agree that is a problem, but the main thing the eye-rolling posts seem to be about is CoC stuff, real or imaginary.

Example of eye-rolling post, above:

> Sadly, Kent responds to everything in an email except the key part that is being pointed out to him (usually his behavior). Or deflects by going on the attack. And generally refuses to apologise.

And there's an email thread linked somewhere here where a CoC member repeatedly replies to Kent's emails with demands for a formal apology. All of this soft, subtle stuff adds up to an impression in people's heads, even though the main output of these projects should be highly complex software, and not bike-shedding email mediation.

ajb

4 months ago

Definitely not saying that the problems are all on one side here. Agreed that going on the attack was bad (as well as dumb).

I just think that while, yes, the kernel folks have tried to explain, they didn't explain well. The "why" of it is a people thing. Linus needs to be able to trust that people he's delegated some authority will respect its limits. The maintainers need to be able to trust that each other maintainer will respect the area that they have been delegated authority over. I think that Kent genuinely doesn't get this.

yxhuvud

4 months ago

1 reply

I've seen plenty of times where the problems has been explained to Kent. But he just don't give a shit about the problems of people that isn't himself or that doesn't use his file system experiences.

bombcar

4 months ago

It seems very clear to me that it's almost always a "you can't argue canon law with the Pope" situation - the rules say no new features, and it doesn't matter what the definition of "feature" is if the definition AND the rule come from the same person, Linus.

You can't win a rules-lawyer argument with the rulemaker.

rob_c

4 months ago

> unable to explain the problem

unfortunately that's either due to lack of investigation by yourself or a bit dishonest.

arp242

4 months ago

People have explaining things, at great length, many times. Many of these have been posted to HN before, either as submissions or comments.

Kent just does not listen. Every time the discussion starts from the top. Even if you do agree on some compromise, in a month or two he'll just do the same thing again and all the same arguments start again.

You can't expect people to detail about four or five years of context in every single engagement for the benefit of interested 3rd parties like you or me.

bornfreddy

4 months ago

4 replies

Being an outsider to this whole scene, the whole thread reads very differently to me.

Kent seems very patient in explaining his position (and frustrations arising from other people introducing bugs to his code) and the kernel & debian folks are performing a smearing campaign instead of replying to what I see are genuine problems in the process. As an example, the quotes that are referenced by user paravoid are, imho, taken out of context (judging by reading the provided links).

There probably is a lot more history to it, but judging from that thread it's not Kent who looks like a bad guy.

johnny22

4 months ago

1 reply

it's waaay simpler than that. Some projects have established rules, and kent doesn't want to follow them. It doesn't matter how nice (or not) he is.

bornfreddy

4 months ago

1 reply

I actually like the idea of the maintainer going out of his way to make sure that my filesystem is safe to use. Even if it goes against the established rules. And I'm saying that as someone who actually likes both Linux and Debian.

rwmj

4 months ago

1 reply

It's a strawman to imagine that Debian doesn't have a way to ensure filesystems are safe and to respond to critical bugs that might cause data corruption. It's just that you have to follow their rules to do it. (And broadly the same rules apply to the other big distros as well).

rcxdude

4 months ago

And the rules demonstrably create situations where it's easy to introduce bugs or hard to fix them, because they prioritise stability and a consistent set of package versions over the version that the upstream developer has tested. Followed blindly (and without putting in the effort to test to the same level of rigor as upstream), this causes problems, and it's right to point those out. Debian's ways would involve the package maintainer putting in a lot more effort to marry their rules with a package that actually worked, and they were not up for that.

(Debian's rules aren't worthless, it's part of how they can make something that's pretty suitable for 'boring infrastructure' systems because they can keep a system with a known and stable set of behavior up to date with critical security fixes for a long time, but boy do they result in some dumb situations sometimes)

arp242

4 months ago

1 reply

Kent brings up Debian himself, unprompted.

This is one of the problems: Kent is frequently unable to accept that things don't go his way. He will keep bringing it up again and again and he just grinds people down with it. If you see just one bit of it then it may seem somewhat reasonable, but it's really not because this is the umpteenth time this exact discussion is happening and it's groundhog day once again.

This is a major reason why people burn out on Kent. You can't just have a disagreement/conflict and resolve it. Everything is a discussion with Kent. He can't just shrug and say "well, I think that's a bit silly, but okay, I can work with it, I guess". The options are 1) Kent gets his way, or 2) he will keep pushing it (not infrequently ignoring previous compromises, restarting the discussion from square one). Here too, the Debian people have this entire discussion (again) forced upon them by Kent's comments in a way that's just completely unnecessary and does nothing to resolve anything.

Even as an interested onlooker who is otherwise uninvolved and generally more willing to accept difficult behaviour than most people, I've rather soured on Kent over time.

koverstreet

4 months ago

2 replies

You do realize that data integrity issues are not "live and let live" type things, right?

And there's a real connection to the issue that sparked all this drama in the kernel and the Debian drama: critical system components (the kernel, the filesystem, and others) absolutely need to be able to get bugfixes in a timely manner. That's not optional.

With Debian, we had a package maintainer who decided that unbundling Rust dependencies was more important than getting out updates, and then we couldn't get a bugfix out for mount option handling. This was a non-issue for every other distro with working processes because the bug was fixed in a few days, but a lot of Debian users weren't able to mount in degraded mode and lost access to their filesystems.

In the kernel drama, Linus threw a fit over a repair code to recover from a serious bug and make sure users didn't lose data, and he's repeatedly picked fights over bugfixes (and even called pushing for getting bugfixes out "whining" in the past).

There are a lot of issues that there can be give and take on, but getting fixes out in a timely manner is just part of the baseline set of expectations for any serious project.

arp242

4 months ago

2 replies

Look, I get where you're coming from. It's not unreasonable. I've said this before.

But there are also reasons why things are the way they are, and that is also not unreasonable. And at the end of the day: Linus is the boss. It really does come down to that. He has dozens of other subsystem maintainers to deal with and this is the process that works for him.

Similar stuff applies to Debian. Personally, I deeply dislike Debian's inflexible and outmoded policy and lack of pragmatism. But you know, the policy is the policy, and at some point you just need to accept that and work with it the best you can.

It's okay to make all the arguments you've made. It's okay to make them forcefully (within some limits of reason). It's not okay to keep repeating them again and again until everyone gets tired of it and seemingly just completely fail to listen to what people are sating. This is where you are being unreasonable.

I mean, you *can* do that, I guess, but look at where things are now. No one is happy with this – certainly not you. And it's really not a surprise, I already said this in November last year: "I wouldn't be surprised to see bcachefs removed from the kernel at some point".[1] To be clear: I didn't want that to happen – I think you've done great work with bcachefs and I really want it to succeed every which way. But everyone could see this coming from miles.

[1]: https://news.ycombinator.com/item?id=42225345

nextaccountic

4 months ago

2 replies

> But there are also reasons why things are the way they are, and that is also not unreasonable.

It is unreasonable if it leads to users losing data. At this point, the only reasonable thing is to either completely remove support for bcachefs or give timely fixes for critical bugs, there's no middle position that won't willfully lead to users losing their data.

This used to be the default for distributions like Debian some time ago. You only supported foundational software if you were willing to also distribute critical fixes in a timely manner. If not, why bother?

For all other issues, I guess we can accept that things are the way they are.

abenga

4 months ago

1 reply

> It is unreasonable if it leads to users losing data.

Changing the kernel development process to allow adding new features willy-nilly late in the RC cycle will lead to much worse things than a few people using an experimental file system losing their data in the long term.

The process exists for a reason, and the kernel is a massive project that includes more than just one file system, no matter how special its developers and users believe it is.

koverstreet

4 months ago

There's no need for kernel development process to change. New features go in during RCs all the time, it's always just a risk vs. reward calculation, and I'm more conservative with what I send outside the merge window that a lot of subsystems.

This blowup was entirely unnecessary.

rwmj

4 months ago

1 reply

Not too familiar with the kernel process for this, but for Linux distros there are ways to respond to critical issues including data corruption and data loss. It's just that you have to follow their processes to do this, such as producing a minimal patch that fixes the problem which is backported into the older code base (and there's a reason for that too: end users don't want churn on their installed systems, they want an install to be stable and predictable). Since distros are how you ultimately get your code into users' hands, it's really their way or the highway. Telling the distros they are wrong isn't going to go well.

nextaccountic

4 months ago

1 reply

For the Debian thing, I'm not sure on the specifics for bcachefs-progs (I'm going by what the author is reporting and some blog posts) but I think the problem with Debian is that they willfully ignore when upstream says "this is only compatible with this library version 2.1.x" and will downgrade or upgrade the library into not supported versions, to match the versions used in other programs already packaged. This kind of thing can introduce subtle, hard to debug bugs. It's a mess and problems are usually reported to upstream, that's a recurrent problem for Rust programs packaged in Debian. Rust absolutely isn't this language where if it compiles, it works, no matter how much people think otherwise.

And this is happening even though it's common for Debian to package the same C library multiple times, like, libfuse2 and libfuse3. This could be done for Rust libraries if they wanted to.

Anyway see the discussion and the relevant article here https://news.ycombinator.com/item?id=41407768 and https://jonathancarter.org/2024/08/29/orphaning-bcachefs-too...

rwmj

4 months ago

1 reply

But that's exactly the point here. In the context of a whole distribution, you don't want to update some package to a new version (on a stable branch), because that would affect lots of other packages that depended on that one. It may even be that other packages cannot work with the new updated dependency. Even if they can, end users don't want versions to change greatly (again, along a stable branch). Upstreams should accept this reality and ensure they support the older libraries as far as possible. Or they can deny reality and then we get into this situation.

And carrying multiple versions is problematic too as it causes increased burdens for the downstream maintainers.

I'd argue that libfuse is a bit of a special case since the API between 2 & 3 changed substantially, and not all dependencies have moved to version 3 (or can move, since if you move the v3 then you break on other platforms like BSD and macOS that still only support the v2 API).

Rust and especially Golang are both a massive pile of instability because the developers don't seem to understand that long term stable APIs are a benefit. You have to put in a bit of care and attention rather than always chasing the new thing and bundling everything.

rwmj

4 months ago

BTW here's where I ported nbdfuse from v2 to v3 so you can see the kinds of changes: https://gitlab.com/nbdkit/libnbd/-/commit/c74c7d7f01975e708b...

koverstreet

4 months ago

You have to consider the bigger picture.

XFS has burned through maintainers, citing "upstream burnout". It's not just bcachefs that things are broken for.

And it was burning me out, too. We need a functioning release process, and we haven't had that; instead I've been getting a ton of drama that's boiled over into the bcachefs community, oftentimes completely drowning out all the calmer, more technical conversations that we want.

It's not great. It would have been much better if this could have been worked out. But at this point, cutting ties with the kernel community and shipping as a DKMS module is really the only path forwards.

It's not the end of the world. Same with Debian; we haven't had those issues in any other distros, so eventually we'll get a better package maintainer who can work the process or they'll figure out that their Rust policy actually isn't as smart as they think it is as Rust adoption goes up.

I'm just going to push for doing things right, and if one route or option fails there's always others.

pas

4 months ago

... yes, yes, they fucking are.

It's the people we meet along (and get along with) not the fucking data.

Cherish people who want to work with you, not one more byte saved.

Users are responsible for their own backups. If you want to be really responsible educate them. (As you already helped some of them to build their own kernel, you can nudge them to keep proper backups too!)

If you have problems with downstream being slow then offer workarounds.

How ridiculous is all this, really? Instead of hosting a shell script that sets up a PPA or a cron job with updates or whatever you try to brute force things through Debian and Linus?

If users are that important for you give them your phone number or whatever.

Seriously. Based on all of what you wrote you need to put effort into having a direct line to your users. (Which is completely fine nowadays. Discord and Twitch/YT office hours and whatnot are all the hype nowadays.)

Stop projecting your needs onto other maintainers.

> critical system components (the kernel, the filesystem, and others) absolutely need to be able to get bugfixes in a timely manner. That's not optional.

that's not how these projects are set up. (and even though your code is upstream to them, your users were (are) downstream to them. they were hosting your project for their users, they can (and did) decide to stop hosting your project.)

data integrity and data-loss-prevention are not considered security updates - where we have a culture of out of bad updates. (because the workaround for them is the standing order is to have working backups.)

MadnessASAP

4 months ago

1 reply

It may seem like that on the surface, but you should recognise that these sorts of situations seem to follow Kent around.

So either Kent is on a righteous crusade against unreasonable processes within the Kernel, Debian, and every other large software project he interacts with. Or there's something about the way Kent interacts with these projects that causes friction.

I like Bcachefs, I think Kent is a very talented developer, but I'm not going to pretend that he is innocent in all this.

o11c

4 months ago

1 reply

OTOH, the only named projects I've seen are Linux and Debian, which are 2 of the most toxic projects I'm aware of (I'm pretty sure the C++ standards committee beats the two of them combined).

But the problem with comparisons is that even if you're better than nuclear waste being dumped into the aquifer, you still might be enough to light a river on fire.

JdeBP

4 months ago

I've been involved in C++ standardization. In my country's national body, it is nothing like what goes on in Linux kernel development, even when there are strong disagreements amongst members.

isr

4 months ago

To be honest, I somewhat agree. I'm sure there's a lot to this that us outsiders don't know (or honestly, for me, couldn't be bothered to know as there are far more important rabbit holes to spend time on).

However, sometimes, a certain detachment can help when looking at what is, in the end, a "cultural disagreement" more suited to an elementary school's playground.

Whenever I see open source spats like this, and then see a Dev harangued and chased from forum to forum by what looks like a coordinated group ("groupies"?), all accusing him/her of rude behaviour, while they keep making attacks on his/her personality, character or temperament...

it leads me to think rather poorly of this "wild west posse".

Anyway, bottom line, Kent is writing open source software to benefit others (and people didn't have qualms about taking his previous bcache & using it to build out storage solutions to make millions), so perhaps he doesn't quite deserve all the abuse and ganging up, no matter whose feathers he ruffled, and how.

Muromec

4 months ago

1 reply

autism is a hell of a social disability sometimes.

thoroughburro

4 months ago

Don’t smear all of us with the bad behavior one.

toast0

4 months ago

It's not just kernel development. In the lwn thread, he mentioned and then demonstrated difficulty working with Debian developers as well.

IMHO, what his communications show is an unwillingness to acknowledge that other projects that include his work have focus, priorities, and policies that are not the same as that of his project. Also, expecting exceptions to be made for his case, since exceptions have been made in other cases.

Again IMHO, I think he would be better off developing apart with an announcement mailing list. When urgent changes are made, send to the announcement list. Let other interested parties sort out the process of getting those changes into the kernel and distributions.

If people come with bug reports from old versions distributed by others, let them know how to get the most up to date version from his repository, and maybe gently poke the distributors.

Yes, that means users will have older versions and not get fixes immediately. But what he's doing isn't working to get fixes to users immediately either.

mook

4 months ago

I think he's too exposed to users reports, because anybody that shows up is in a potential data loss situation. So he's very focused on making everything as bug free as possible, and getting frustrated that people with different focus are not propagating the fixes as fast as possible.

Almost makes me think the distros light-forking it to just change the name (IceWeasel style) so the support requests don't get to him will help… probably not, though, because people will still go there because they want to recover their data.

betaby

4 months ago

4 replies

The sad part, that despite the years of the development BTRS never reached the parity with ZFS. And yesterday's news "Josef Bacik who is a long-time Btrfs developer and active co-maintainer alongside David Sterba is leaving Meta. Additionally, he's also stepping back from Linux kernel development as his primary job." see https://www.phoronix.com/news/Josef-Bacik-Leaves-Meta

There is no 'modern' ZFS-like fs in Linux nowadays.

ibgeek

4 months ago

3 replies

This isn’t BTRFS

doubletwoyou

4 months ago

1 reply

This might not be directly about btrfs but bcachefs zfs and btrfs are the only filesystems for Linux that provide modern features like transparent compression, snapshots, and CoW.

zfs is out of tree leaving it as an unviable option for many people. This news means that bcachefs is going to be in a very weird state in-kernel, which leaves only btrfs as the only other in-tree ‘modern’ filesystem.

This news about bcachefs has ramifications about the state of ‘modern’ FSes in Linux, and I’d say this news about the btrfs maintainer taking a step back is related to this.

ajross

4 months ago

8 replies

Meh. This war was stale like nine years ago. At this point the originally-beaten horse has decomposed into soil. My general reply to this is:

1. The dm layer gives you cow/snapshots for any filesystem you want already and has for more than a decade. Some implementations actually use it for clever trickery like updates, even. Anyone who has software requirements in this space (as distinct from "wants to yell on the internet about it") is very well served.

2. Compression seems silly in the modern world. Virtually everything is already compressed. To first approximation, every byte in persistent storage anywhere in the world is in a lossy media format. And the ones that aren't are in some other cooked format. The only workloads where you see significant use of losslessly-compressible data are in situations (databases) where you have app-managed storage performance (and who see little value from filesystem choice) or ones (software building, data science, ML training) where there's lots of ephemeral intermediate files being produced. And again those are usages where fancy filesystems are poorly deployed, you're going to throw it all away within hours to days anyway.

Filesystems are a solved problem. If ZFS disappeared from the world today... really who would even care? Only those of us still around trying to shout on the internet.

anon-3988

4 months ago

2 replies

> Filesystems are a solved problem. If ZFS disappeared from the world today... really who would even care? Only those of us still around trying to shout on the internet.

Yeah nah, have you tried processing terabytes of data every day and storing them? It gets better now with DDR5 but bit flips do actually happen.

bombcar

4 months ago

1 reply

Bit flips can happen, and if it’s a problem you should have additional verification above the filesystem layer, even if using ZFS.

And maybe below it.

And backups.

Backups make a lot of this minor.

toast0

4 months ago

1 reply

Backups are great, but don't help much if you backup corrupted data.

You can certainly add verification above and below your filesystem, but the filesystem seems like a good layer to have verification. Capturing a checksum while writing and verifying it while reading seems appropriate; zfs scrub is a convenient way to check everything on a regular basis. Personally, my data feels important enough to make that level of effort, but not important enough to do anything else.

ajross

4 months ago

2 replies

FWIW, framed the way you do, I'd say the block device layer would be an *even better* place for that validation, no?

> Personally, my data feels important enough to make that level of effort, but not important enough to do anything else.

OMG. Backups! You need backups! Worry about polishing your geek cred once your data is on physically separate storage. Seriously, this is not a technology choice problem. Go to Amazon and buy an exfat stick, whatever. By far the most important thing you're ever going to do for your data is Back. It. Up.

Filesystem choice is, and I repeat, very much a yell-on-the-internet kind of thing. It makes you feel smart on HN. Backups to junky Chinese flash sticks are what are going to save you from losing data.

tptacek

4 months ago

1 reply

Ok I think you're making a well-considered and interesting argument about devicemapper vs. feature-ful filesystems but you're also kind of personalizing this a bit. I want to read more technical stuff on this thread and less about geek cred and yelling. :)

I wouldn't comment but I feel like I'm naturally on your side of the argument and want to see it articulated well.

ajross

4 months ago

I didn't really think it was that bad? But sure, point taken.

My goal was actually the same though: to try to short-circuit the inevitable platform flame by calling it out explicitly and pointing out that the technical details are sort of a solved problem.

ZFS argumentation gets exhausting, and has ever since it was released. It ends up as a proxy for Sun vs. Linux, GNU vs. BSD, Apple vs. Google, hippy free software vs. corporate open source, pick your side. Everyone has an opinion, everyone thinks it's crucially important, and as a result of that hyperbole everyone ends up thinking that ZFS (dtrace gets a lot of the same treatment) is some kind of magically irreplaceable technology.

And... it's really not. Like I said above if it disappeared from the universe and everyone had to use dm/lvm for the actual problems they need to solve with storage management[1], no one would really care.

[1] Itself an increasingly vanishing problem area! I mean, at scale and at the performance limit, virtually everything lives behind a cloud-adjacent API barrier these days, and the backends there worry much more about driver and hardware complexity than they do about mere "filesystems". Dithering about individual files on individual systems in the professional world is mostly limited to optimizing boot and update time on client OSes. And outside the professional world it's a bunch of us nerds trying to optimize our movie collections on local networks; realistically we could be doing that on something as awful NTFS if we had to.

toast0

4 months ago

1 reply

I apprechiate the argument. I do have backups. Zfs makes it easy to send snapshots and so I do.

But I don't usually verify the backups, so there's that. And everything is in the same zip code for the most part, so one big disaster and I'll lose everything. C'est la vie.

petre

4 months ago

What good is a backup if you can't restore it?

ajross

4 months ago

2 replies

And once more, you're positing the lack of a feature that is available and very robust (c.f. "yell on the internet" vs. "discuss solutions to a problem"). You don't need your filesystem to integrate checksumming when dm/lvm already do it for you.

yjftsjthsd-h

4 months ago

1 reply

> You don't need your filesystem to integrate checksumming when dm/lvm already do it for you.

https://wiki.archlinux.org/title/Dm-integrity

> It uses journaling for guaranteeing write atomicity by default, which effectively halves the write speed

I'd really rather not do that, thanks.

ajross

4 months ago

1 reply

So... there's a reason you had to cite a throwaway comment on a distro wiki and not documentation. Needless to say journaling metadata (something done in some form by every filesystem you will ever use!) does not, in fact, "halve the write speed".

yjftsjthsd-h

4 months ago

1 reply

> So... there's a reason you had to cite a throwaway comment on a distro wiki and not documentation.

No, I read the official kernel docs too; the Arch wiki just happened happened to be a quicker way to describe it.

From https://docs.kernel.org/admin-guide/device-mapper/dm-integri... -

> The dm-integrity target can also be used as a standalone target, in this mode it calculates and verifies the integrity tag internally. In this mode, the dm-integrity target can be used to detect silent data corruption on the disk or in the I/O path.

> There’s an alternate mode of operation where dm-integrity uses a bitmap instead of a journal. If a bit in the bitmap is 1, the corresponding region’s data and integrity tags are not synchronized - if the machine crashes, the unsynchronized regions will be recalculated. The bitmap mode is faster than the journal mode, because we don’t have to write the data twice, but it is also less reliable, because if data corruption happens when the machine crashes, it may not be detected.

This is more clearly presented lower down in the list of modes, in which most options describe how they don't actually protect against crashes, except for journal mode:

> J - journaled writes

> data and integrity tags are written to the journal and atomicity is guaranteed. In case of crash, either both data and tag or none of them are written. The journaled mode degrades write throughput twice because the data have to be written twice.

On further reflection, I grant that that might only be talking about the integrity metadata, in which case we just don't know about the impact to data writes and it would be useful to go benchmark to see what the hit is in practice.

EDIT: So I went looking to see if anyone had done that benchmarking and found https://github.com/t13a/dm-integrity-benchmarks which seems to show that actually yes dm-integrity is that bad on data writes. Of course, its possible saving grace is that everything else with the same features also had a performance hit. I also found https://www.reddit.com/r/linuxadmin/comments/1crtggd/why_dmi... talking about it.

ajross

4 months ago

FWIW, the github link you show clearly shows the ext4-on-dm stack to be FASTER than ZFS!

It only falls behind, and very signficantly so, on the 1M sequential write test, exactly the situation where you'd expect there to be the least delta between systems! I'm going to bet anything that's a misconfigured RAID.

Frankly looking at that from a "will this work best for my general purpose filesystem used mostly to handle giant software builds and Zephyr test suites" it seems like a no brainer to pick dm, especially so given the simplicity argument.

khimaros

4 months ago

1 reply

i'm not one for internet arguments and really just want solutions. maybe you could point me at the details for a setup that worked for you?

based on my own testing, dm has a lot of footguns and, with some kernels, as little as 100 bytes of corruption to the underlying disk could render a dm-integrity volume completely unusable (requiring a full rebuild) https://github.com/khimaros/raid-explorations

justincormack

4 months ago

Well the intention of the integrity things is to preserve integrity that is an explicit choice, in particular for encrypted data. You definitely need a backup strategy.

pdimitar

4 months ago

2 replies

> The dm layer gives you cow/snapshots for any filesystem you want already and has for more than a decade. Some implementations actually use it for clever trickery like updates, even.

O_o

Apparently I've been living under a rock, can you please show us a link about this? I was just recently (casually) looking into bolting ZFS/BTRFS-like partial snapshot features to simulate my own atomic distro where I am able to freely roll back if an update goes bad. Think Linux's Timeshift with something little extra.

tux3

4 months ago

There are downsides to adding features in layers, as opposed to integrating them with the FS, but dm can do quite a lot:

https://docs.kernel.org/admin-guide/device-mapper/snapshot.h...

tptacek

4 months ago

DM has targets that facilitate block-level snapshots, lazy cloning of filesystems, compression, &c. Most people interact with those features through LVM2. COW snapshots are basically the marquee feature of LVM2.

ThatPlayer

4 months ago

1 reply

For me bcachefs provides a feature no other filesystem on Linux has: automated tiered storage. I've wanted this ever since I got an SSD more than 10 years ago, but filesystems move slow.

A block level cache like bcache (not fs) and dm-cache handles it less ideally, and doesn't leave the SSD space as usable space. As a home user, 2TB of SSDs is 2TB of space I'd rather have. ZFS's ZIL is similar, not leaving it as usable space. Btrfs has some recent work in differentiating drives to store metadata on the faster drives (allocator hints), but that only does metadata as there is no handling of moving data to HDDs over time. Even Microsoft's ReFS does tiered storage I believe.

I just want to have 1 or 2 SSDs, with 1 or 2 HDDs in a single filesystem that gets the advantages of SSDs with recently used files and new writes, and moves all the LRU files to the HDDs. And probably keep all the metadata on the SSDs too.

guenthert

4 months ago

3 replies

> automated tiered storage. I've wanted this ever since I got an SSD more than 10 years ago, but filesystems move slow.

You were not alone. However, things changed, namely SSD continued to become cheaper and grew in capacity. I'd think most active data is these days on SSDs (certainly in most desktops, most servers which aren't explicit file or DB servers and all mobile and embedded devices), the role of spinning rust being more and more archiving (if found in a system at all).

jcgl

4 months ago

Until $/GB drops to comparable to HDDs, large-scale storage will continue to use HDDs.

wtallis

4 months ago

Tiering didn't go away with the migration to all-SSD storage. It just got somewhat hidden. All consumer SSDs are doing tiered storage within the drive, using drive-specific heuristics that are completely undocumented, and host software rarely if ever makes use of features that exist to provide hints to the SSD to allow its tiering/caching to be more intelligent. In the server space, most SSDs aren't doing this kind of caching, but it's definitely not unheard-of.

ThatPlayer

4 months ago

Yeah, for enterprise where you can have dedicated machines for single use (and $) there probably isn't much appeal. That's why I emphasized as a home user, where all my machines are running various applications.

Also for video games, where performance matters, game sizes are huge, and it's nice to have a bunch of games installed.

yjftsjthsd-h

4 months ago

3 replies

> Compression seems silly in the modern world. Virtually everything is already compressed.

IIRC my laptop's zpool has a 1.2x compression ratio; it's worth doing. At a previous job, we had over a petabyte of postgres on ZFS and saved real money with compression. Hilariously, on some servers we also improved performance because ZFS could decompress reads faster than the disk could read.

pezezin

4 months ago

2 replies

How do you get a PostgreSQL database to grow to one petabyte? The maximum table size is 32 TB o_O

olavgg

4 months ago

Probably by using partitioning.

yjftsjthsd-h

4 months ago

Cumulative; dozens of machines with a combined database size over a PB even though each box only had like 20 TB.

bionsystem

4 months ago

The performance gain from compression (replacing IO with compute) is not ironic, it was seen as a feature for the various NAS that Sun (and after them Oracle) developped around ZFS.

adzm

4 months ago

> we also improved performance because ZFS could decompress reads faster than the disk could read

This is my favorite side effect of compression in the right scenarios. I remember getting a huge speed up in a proprietary in-memory data structure by using LZO (or one of those fast algorithms) which outperformed memcpy, and this was already in memory so no disk io involved! And used less than a third of the memory.

fluidcruft

4 months ago

One feature I like about ZFS and have not seen elsewhere is that you can have each filesystem within the pool use its own encryption keys but more importantly all of the pool's data integrity and maintenance protection (scrubs, migrations, etc) work with filesystems in their encrypted state. So you can boot up the full system and then unlock and access projects only as needed.

The dm stuff is one key for the entire partition and you can't check it for bitrot or repair it without the key.

dilyevsky

4 months ago

The other thing dm/lvm gives you is dogshit performance

doubletwoyou

4 months ago

I know my own personal anecdote isn’t much, but I’ve noticed pretty good space savings on the order of like 100 GB from zstd compression and CoW on my personal disks with btrfs

As for the snapshots, things like LVM snapshots are pretty coarse, especially for someone like me where I run dm-crypt on top of LVM

I’d say zfs would be pretty well missed with its data integrity features. I’ve heard that btrfs is worse in that aspect, so given that btrfs saved my bacon with a dying ssd, I can only imagine what zfs does.

trashface

4 months ago

> And the ones that aren't are in some other cooked format.

Maybe, if you never create anything. I make a lot of game art source and much of that is in uncompressed formats. Like blend files, obj files, even DDS can compress, depending on the format and data, due to the mip maps inside them. Without FS compression it would be using GBs more space.

I'm not going to individually go through and micromanage file compression even with a tool. What a waste of time, let the FS do it.

zozbot234

4 months ago

1 reply

Does btrfs still eat your data if you try to use its included RAID featureset? Does it still break in a major way if you're close to running out of disk space? What I'm seeing is that most major Linux distributions still default to non-btrfs options for their default install, generally ext4.

skibbityboop

4 months ago

Anecdotal but btrfs is the only filesystem I've lost data with (and it wasn't in a RAID configuration). That combined with the btrfs tools being the most aggressively bad management utilities out there* ensure that I'm staying with ext4/xfs/zfs for now.

*Coming from the extremely well thought out and documented zfs utilities to btrfs will have you wondering wtf fairly frequently while you learn your way around.

NewJazz

4 months ago

Btrfs is the closest in-tree bcachefs alternative.

ofrzeta

4 months ago

4 replies

Suse Linux Enterprise still uses Btrfs as the Root-FS, so it can't be that bad, right? What is Chris Mason actually doing these days? I did some googling and only found out that he was working on a tool called "rsched".

dmm

4 months ago

2 replies

btrfs is fine for single disks or mirrors. In my experience, the main advantages of zfs over btrfs is that ZFS has production ready raid5/6 like parity modes and has much better performance for small sync writes, which are common for databases and hosting VM images.

riku_iki

4 months ago

1 reply

> has much better performance for small sync writes

I spent some time researching this topic, and in all benchmarks I've seen and my personal tests btrfs is faster or much faster: https://www.reddit.com/r/zfs/comments/1i3yjpt/very_poor_perf...

dmm

4 months ago

1 reply

Thanks for sharing! I just setup a fs benchmark system and I'll run your fio command so we can compare results. I have a question about your fio args though. I think "--ioengine=sync" and "--iodepth=16" are incompatible, in the sense that iodepth will only be 1.

"Note that increasing iodepth beyond 1 will not affect synchronous ioengines"[1]

Is there a reason you used that ioengine as opposed to, for example, "libaio" with a "--direct=1" flag?

[1] https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-...

riku_iki

4 months ago

Intuition is that majority of software uses standard sync FS api..

m-p-3

4 months ago

1 reply

Context: I mostly dealt with RAID1 in a home NAS setup

A ZFS pool will remain available even in degraded mode, and correct me if I'm wrong but with BTRFS you mount the array through one of the volume that is part of the array and not the array itself.. so if that specific mounted volume happens to go down, the array becomes unavailable unmounted until you remount another available volume that is part of the array which isn't great for availability.

I thought about mitigating that by making an mdadm RAID1 formatted with BTRFS and mount the virtual volume instwad, but then you lose the ability to prevent bit rot, since BTRFS lose that visibility if it doesn't manage the array natively.

wtallis

4 months ago

1 reply

> with BTRFS you mount the array through one of the volume that is part of the array and not the array itself

I don't think btrfs has a concept of having only some subvolumes usable. Either you can mount the filesystem or you can't. What may have confused you is that you can mount a btrfs filesystem by referring to any individual block device that it uses, and the kernel will track down the others. But if the one device you have listed in /etc/fstab goes missing, you won't be able to mount the filesystem without fixing that issue. You can prevent the issue in the first place by identifying the filesystem by UUID instead of by an individual block device.

m-p-3

4 months ago

> I don't think btrfs has a concept of having only some subvolumes usable. Either you can mount the filesystem or you can't.

You can still mount the BTRFS array as degraded if you specify it during mount. But then this lead to some others issues like the missing data written while degraded will not be automatically be copied over without doing a scrub, while ZFS will resilver it automatically, etc

> You can prevent the issue in the first place by identifying the filesystem by UUID instead of by an individual block device.

I tried that, but all it does is select the first available block device during mount, so if that device goes down, the mount also goes down.

yjftsjthsd-h

4 months ago

I used btrfs a few years ago, on OpenSUSE, because I also thought that would work, and it was on a single disk. It lost my root filesystem twice.

xelxebar

4 months ago

I've used btrfs for 5-ish years in the most mundane, default setup possible. However, in that time, I've had three instances of corruption across three different drives, all resulting in complete loss of the filesystem. Two of these were simply due to hard power failures, and another due to a flaky cpu.

AFAIU, btrfs effectively absolves itself of responsibility in these cases, claiming the issue is buggy drive firmware.

petre

4 months ago

We use OpenSuSE and I always switch the installs to ext4. No fancy features, but always works, doesn't lose my root fs.

tw04

4 months ago

1 reply

There's literally ZFS-on-linux and it works great. And yes, I will once again say Linus is completely wrong about ZFS and the multiple times he's spoken about it, it's abundantly clear he's never used it or bothered to spend any time researching its features and functionality.

https://zfsonlinux.org/

evanjrowley

4 months ago

1 reply

Sometimes I wonder how someone so talented could be so wrong about ZFS, and it makes me wonder if his negative responses to ZFS discussions could be a way of creating plausible deniability in case Oracle's lawyers ever learn how to spell ZFS.

wmf

4 months ago

If Linus has never touched ZFS that's not plausible deniability. That's actual deniability.

jakebasile

4 months ago

I just use ZFS. Canonical ships it and that's good enough for me on my personal machines.

rstat1

4 months ago

Surprised it took this long.

223 more comments available on Hacker News

View full discussion on Hacker News

ID: 45074312Type: storyLast synced: 11/20/2025, 8:18:36 PM