Async and Finaliser Deadlocks

Postedabout 2 months agoActiveabout 2 months ago

todsacerdoti

16 points

11 comments

tratt.netTechstory

calmnegative

Debate

40/100

RustAsync ProgrammingDeadlocksConcurrency

Key topics

Rust

Async Programming

Deadlocks

Concurrency

The article discusses deadlocks in Rust's async model, with commenters debating the root cause and potential solutions, particularly around the fairness of async locks.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

Peak period

9-10h

Avg / period

1.8

Comment distribution11 data points

Loading chart...

Based on 11 loaded comments

Key moments

01Story posted
Nov 12, 2025 at 6:38 AM EST
about 2 months ago
Step 01
02First comment
Nov 12, 2025 at 12:37 PM EST
6h after posting
Step 02
03Peak activity
4 comments in 9-10h
Hottest window of the conversation
Step 03
04Latest activity
Nov 12, 2025 at 7:39 PM EST
about 2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (11 comments)

Showing 11 comments

elchananHaas

about 2 months ago

2 replies

It is well known that Rust's async model can lead to deadlocks. However, in the futurelock case I have come around to blaming the Async Locks. The issues is that they are not "fair" in that they don't poll the future holding the lock. There may be some other tradeoffs that would happen if the locks were in some way "fairer" but I think they should be explored.

scottlamb

about 2 months ago

1 reply

> The issues is that they are not "fair" in that they don't poll the future holding the lock.

How would you change that? A section of code doesn't have access to its own future to call into.

Best I can think is that you can't just call `let guard = mut.lock().await` but instead have to do `mut.do_locked(fut).await`, so that other `do_locked` calls can poll `fut`. I think that would work, but it seems quite awkward. Then again, imho async mutexes are something that should be used quite sparingly, so maybe that's okay.

elchananHaas

about 2 months ago

1 reply

Disclaimer - I'm not a Tokio dev so what I say may be very wrong. Some definitions:

    Future = a structure with a method poll(self: Pin<&mut Self>, ...) -> Poll<Self::Output>; Futures are often composed of other futures and need to poll them. 


    Tokio task = A top-level future that is driven by the Tokio runtime. These are the only futures that will be run even if not polled.

My understanding is that Tokio async locks have a queue of tasks waiting on lock. When a lock is unlocked, the runtime polls the task at the front of the queue. Futurelock happens when the task locks the lock, then attempts to lock it a second time. This can happen when a sub-future of the top level task already has the lock, then it polls a different future which tries to take the lock.

This situation should be detectable because Tokio tracks which task is holding an async lock. One improvement could be to panic when this deadlock is spotted. This would at least make the issue easier to debug.

But yes, I think you are right in that the async mutex would need to take the future by value if it has the capability of polling it.

scottlamb

about 2 months ago

1 reply

> This situation should be detectable because Tokio tracks which task is holding an async lock. One improvement could be to panic when this deadlock is spotted. This would at least make the issue easier to debug.

That'd be a nice improvement! It could give a clear error message instead of hanging.

...but if they actually are polling both futures correctly via `tokio::join!` or similar, wouldn't it also cause an error where otherwise it'd actually work?

elchananHaas

about 2 months ago

1 reply

Oof, I think that you are right. The issue with Futurelock is a failure of liveness, where the Future holding the lock doesn't get polled. tokio::join! would keep it alive and therefore my suggestion would mistakenly panic.

Yeah, the true fix is probably some form of the fabled Linear types/Structured concurrency where you can guarantee liveness properties.

scottlamb

about 2 months ago

On third thought, maybe your detection idea would work. I think you're right that the tokio runtime knows the lock is owned by this task's future A, and that this task's future B is waiting for the same task. So far that's arguably fine (if inefficient to try acquiring the lock twice in parallel from the same task).

I think it also should know that after future A has been awoken, the next call into the task's outermost future is returning `Poll::Pending` without polling future A, which is the suss part.

> Yeah, the true fix is probably some form of the fabled Linear types/Structured concurrency where you can guarantee liveness properties.

Maybe? I really don't know the details well enough to say a linear types thing could guarantee not only that the thing isn't dropped but also that it continues getting polled in a timely way.

skavi

about 2 months ago

1 reply

My opinion is tokio’s Mutex [0] is too fair.

It passes ownership directly to the next future in the lock queue. If it was instead more similar to a futex [1], this problem could have been avoided.

My assumption is that tokio went with this design because simple Future subexecutors [2] tend to have very poor scheduling. Often they poll each of their child futures in turn regardless of which were actually woken. With an async locks closer in design to a futex, this could lead to subexecutor child futures being starved out.

If that was truly tokio’s reasoning for the design of their Mutex, I still kinda disagree with the choice; it shouldn’t be the lock’s job to fix tokio::select! being bad at scheduling.

[0]: We should be specific that we’re discussing tokio’s Mutex. this is one particular implementation of async locks.

[1]: wake next in queue, but don’t pass ownership. the woken task must CAS to actually acquire.

[2]: think tokio::select! or futures_lite::future::Or. but not FuturesUnordered which does child wakeups properly.

scottlamb

about 2 months ago

1 reply

> It passes ownership directly to the next future in the lock queue. If it was instead more similar to a futex [1], this problem could have been avoided.

...sometimes? Wouldn't we still have the problem if the future runs (actually getting the lock) but then awaits again while holding it? I think that's common—if you're not awaiting while holding the lock, then why didn't you just use a simple std::sync::Mutex?

skavi

about 2 months ago

1 reply

futurelock [0] was special specifically because of this aspect where even a future which seemingly acquires and releases a lock in a single poll triggers a deadlock.

what you describe is just a standard async deadlock. much easier to spot when debugging. and one can reason about those deadlocks in pretty much the same way one would reason about deadlocks between threads.

[0]: as named and described in https://rfd.shared.oxide.computer/rfd/0609

scottlamb

about 2 months ago

1 reply

> futurelock [0] was special specifically because of this aspect where even a future which seemingly acquires and releases a lock in a single poll triggers a deadlock.

Their description at the top doesn't seem to match that:

RFD> This RFD describes futurelock: a type of deadlock where a resource owned by Future A is required for another Future B to proceed, while the Task responsible for both Futures is no longer polling A. Futurelock is a particularly subtle risk in writing asynchronous Rust.

...and further on they describe lock acquisition as an example of the resource:

RFD> future F1 is blocked on future F2 in some way (e.g., acquiring a shared Mutex)

...so I think they meant it to be more general.

> what you describe is just a standard async deadlock. much easier to spot when debugging. and one can reason about those deadlocks in pretty much the same way one would reason about deadlocks between threads.

I think the not-being-polled aspect of it is a bit more subtle than between threads. More like thread vs signal/interrupt handler actually, except it's not as well-known that "branch taken after a `select!`" or "place where two futures exist and `join!`/`spawn` isn't being used" is such a special case for scheduling.

...and anyway, with a mutex that has an actual reason to be async, how can you have only a acquire bug but not also have a potential mid-holding bug? You can say the latter is a different class of bug so you've solved futurelock, but you still have a bug any time you would have had futurelock, so semantics 1 working program 0.

skavi

about 2 months ago

If do_async_thing was implemented as below, i can’t imagine the futurelock post getting anywhere near this much attention. As for why you might use an async lock in a future which acquires and unlocks in the same poll, there still may be other actors which hold the lock across multiple polls. (there is one in the RFD’s minimized example).

    async fn do_async_thing(label: &str, lock: Arc<Mutex<()>>) {
        println!("{label}: started");
        let _guard = lock.lock().await;
        println!("{label}: acquired lock");
        sleep(Duration::from_secs(5)).await;
        println!("{label}: done");
    }

View full discussion on Hacker News

ID: 45898923Type: storyLast synced: 11/20/2025, 1:26:54 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN