Linux 6.18 Will Fix Lockups When Systemd Units Read Lots of Files

Posted3 months agoActive3 months ago

Bender

44 points

25 comments

phoronix.comTechstory

calmmixed

Debate

60/100

LinuxSystemdPerformance Optimization

Key topics

Linux

Systemd

Performance Optimization

Linux 6.18 will fix a performance issue causing lockups when systemd units read many files, sparking discussion on the root cause and potential workarounds.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

23m

Peak period

0-2h

Avg / period

Comment distribution25 data points

Loading chart...

Based on 25 loaded comments

Key moments

01Story posted
Sep 27, 2025 at 4:26 PM EDT
3 months ago
Step 01
02First comment
Sep 27, 2025 at 4:49 PM EDT
23m after posting
Step 02
03Peak activity
15 comments in 0-2h
Hottest window of the conversation
Step 03
04Latest activity
Sep 28, 2025 at 10:22 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (25 comments)

Showing 25 comments

CaliforniaKarl

3 months ago

1 reply

This seems to me to be a cgroup issue, not a systemd issue, though systemd's pervasive use of cgroups make it the most-obvious trigger.

themafia

3 months ago

It was a shared queue with high contention and it was being sorted more than necessary. The fix is to use independent queues and to not sort the dirty list.

gpm

3 months ago

8 replies

When I stop and think about it writing access times for everything seems extremely wasteful... does anything actually use this field? Any reason I shouldn't change all my file systems to mount with noatime?

It's hard for me to imagine using it for anything myself, considering the number of times I do something like run a search (or a backup command) across literally every file I care about.

JoshTriplett

3 months ago

2 replies

Ideally, noatime would be the default, and applications that still care about atime would be updated to open with a new `O_ATIME` flag. Or, better yet, track it themselves independently.

It's completely reasonable to turn it on. And also, when you're writing applications for Linux, consider using the `O_NOATIME` flag in your file opens.

mmh0000

3 months ago

1 reply

`relatime` is the default on many distros, and has long been "best-practice" for first pass Linux performance turning.

  MOUNT(8)       
       relatime
           Update inode access times relative to modify or change time. Access time is only updated if the previous access time was earlier than or equal to the current
           modify or change time. (Similar to noatime, but it doesn’t break mutt(1) or other applications that need to know if a file has been read since the last time it
           was modified.)

           Since Linux 2.6.30, the kernel defaults to the behavior provided by this option (unless noatime was specified), and the strictatime option is required to
           obtain traditional semantics. In addition, since Linux 2.6.30, the file’s last access time is always updated if it is more than 1 day old.

JoshTriplett

3 months ago

relatime is not as good as noatime, and it doesn't solve the problem reported in the article, since you can still easily end up with a large number of atime updates if you have a large number of files.

lq9AJ8yrfs

3 months ago

I think you're on to something, though it's optimistic to count on app devs to specify a novel platform-specific flag. "Don't break user-space" seems like it's in contest here.

Maybe if you could taint a process (or perhaps the inode and/or path instead) so that its and its children's opens get the new o_atime flag by default, so that systemd or whatever could set it for legacy processes (or files/paths) that need it.

Then distros or SRE's could put up with it without nagging all the SWEs about linuxisms. Some of whom may not know or care their code runs on linux.

pengaru

3 months ago

1 reply

It's one of those things that you don't care about until you do.

As a former sysadmin through the dotcom booms, we regularly depended on atime for identifying which files are actively being used in myriad situations.

Sometimes you're just confirming a config file was actually reloaded in response to your HUP signal. Other times you're trying to find out which data files a customer's cgi-bin mess is making use of.

It's probably less relevant today where multi-user unix hosts are less common, but it was quite valuable information to maintain back then.

__turbobrew__

3 months ago

1 reply

> we regularly depended on atime for identifying which files are actively being used in myriad situations

You can do that with bpf tooling now, for example the `opensnoop` BCC program can capture all file opens on demand. You can also write tools which capture all POSIX IOs to specific files/directories. I can see atime sometimes being useful in some super niche use cases such as hisenbugs you cannot reproduce reliably, but I would be reaching for BPF tools first.

homebrewer

3 months ago

Forget bpf, Linux has the audit framework to solve exactly this problem, and it's been in heavy use for a couple of decades now

https://www.redhat.com/en/blog/configure-linux-auditing-audi...

https://linux-audit.com/linux-audit-framework/configuring-an...

p_l

3 months ago

The original use case for access times was for backup purposes - long unused files would be marked for "reaping" by backup process and either moved to backup storage or moved to a different storage tier in a hierarchical storage management system

williadc

3 months ago

I use atime to identify archives that can be retired. It's common for circuit designer to release a lot of large files for their peers to analyze or incorporate into a parent/grandparent simulation. They will use that data for as long as it is still relevant, which means different things for different types of data, and the only consistent thing we've found is that if the data hasn't been accessed in awhile, then we can retire it.

bryanlarsen

3 months ago

Mail readers that use the mbox format are pretty much the only common user.

Avamander

3 months ago

I've found it useful for forensic reasons and debugging.

themafia

3 months ago

The canonical example of an application that can break with 'noatime' is the "mutt" email client with mbox style single file email spools.

Most modern applications are not designed to operate on shared files like this so in general 'noatime' is safe for 99.9% of software.

NekkoDroid

3 months ago

> does anything actually use this field?

Systemd in a way does. One of the systemd-tmpfiles entry option is to clean up unused files after some time (it ships defaults for /tmp/ after 10 days and /var/tmp/ after 30 days) and for this it checks atime, mtime and ctime to determin if it should delete the file (I think you can also take a flock on the file to prevent it from being deleted as well)

jmclnx

3 months ago

2 replies

I admit, I do not fully understand systemd, but having to add logic like this is very odd. If "too many" is reached, couldn't they add a pause and throw a message into /var/log/messages ?

This indicates to me a very poor design. If not, it is a validation of the old UNIX saying "do one thing and do it well" and "keep programs small" (paraphrasing).

pengaru

3 months ago

1 reply

  > I admit, I do not fully understand systemd ...

  > This indicates to me a very poor design. If not, it is a validation of the old UNIX saying "do one thing and do it well" and "keep programs small" (paraphrasing).

You don't need to fully understand systemd to understand TFA describes a kernel fix.

This isn't a systemd problem, systemd just makes use of cgroups. The kernel has a degenerate case handling lazy atime updates combined with cgroups.

pizlonator

3 months ago

1 reply

Kinda yeah?

I’d say it’s both a systemd issue and a kernel issue. The fact that systemd motivates kernel fixes does point to systemd being maybe just a bit overengineered

pengaru

3 months ago

1 reply

> I’d say it’s both a systemd issue and a kernel issue. The fact that systemd motivates kernel fixes does point to systemd being maybe just a bit overengineered

systemd is basically a victim here, you're quasi engaging in a tech form of victim blaming.

don't blame systemd for making use of kernel features (cgroups)

and without cgroups linux has no sandboxing capabilities, and would be largely irrelevant to today's workloads

pizlonator

3 months ago

1 reply

If I was blaming only systemd then you’d be right.

Look if I wrote a thing that caused kernel lockups then I’d blame myself even if the kernel dudes fixed the issue

lokar

3 months ago

The kernel has a very clear API and expected behavior. Systemd is not doing anything wrong, it’s using the API correctly.

It’s a kernel bug.

wrs

3 months ago

It’s not really specific to systemd, it’s about cgroups in the kernel. If your code is running as a systemd unit so it gets its own cgroup, there you are.

malkia

3 months ago

I was listening to Matt Godbolt's the Two's Complement podcast "Squashing Compilers" and this got my attention, I think this was Ben Rady sharing his recent systemd issue - seems like related

https://youtu.be/Au15lSiAkeQ?si=sxxP2ia9vUkWY5qy&t=982

From the YouTube transcript:

"I don't know what systemd is doing to take so long cuz this is the rub systemd essentially takes 100% CPU twice over. So on our two core machine that we run these things on, I can run top that when I actually got it, I said to you the machine was unresponsive, right? Because all in kernel land, locks are being taken out left, right, and center. Um, you know, we're trying to mount these things in parallel at sensible levels because we want to try and mount"

porridgeraisin

3 months ago

Another case of Accidentally Quadratic

View full discussion on Hacker News

ID: 45399063Type: storyLast synced: 11/20/2025, 1:20:52 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN