Cancellations in Async Rust

Posted3 months agoActive3 months ago

todsacerdoti

240 points

89 comments

sunshowers.ioTechstoryHigh profile

calmmixed

Debate

60/100

RustAsync ProgrammingCancellation

Key topics

Rust

Async Programming

Cancellation

The article discusses the challenges of cancellation in async Rust, and the discussion revolves around the nuances of cancel safety and correctness, as well as the trade-offs of async programming in Rust.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

0-6h

Avg / period

8.1

Comment distribution89 data points

Loading chart...

Based on 89 loaded comments

Key moments

01Story posted
Oct 3, 2025 at 12:18 PM EDT
3 months ago
Step 01
02First comment
Oct 3, 2025 at 2:06 PM EDT
2h after posting
Step 02
03Peak activity
52 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Oct 7, 2025 at 8:52 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (89 comments)

Showing 89 comments

CaptainOfCoit

3 months ago

4 replies

Less clickbaity title: Cancellations in async Rust.

It's really not about "cancelling async Rust" which is what I expected, even if it didn't make much sense.

happytoexplain

3 months ago

3 replies

As in the pop-culture concept of cancelling? That's what you assumed the topic "cancelling async <language name>" was going to be about??

Or am I missing context?

hansvm

3 months ago

1 reply

That's what I assumed. I know languages have to handle cancellations in async code, but Rust has had a fair amount of drama over the years, and I assumed the title was accurate and reflected that some drama was happening.

sunshowers

3 months ago

Appreciate the feedback here — definitely don't want the title to overshadow the work itself. Will keep this in mind for next time.

thehamkercat

3 months ago

they probably assumed something like some_running_async_task.cancel()

fulafel

3 months ago

Async in Rust isn't exactly universally loved, partly because async Rust is perceived to spread progressively to Rust libraries making it less optional to use. See eg https://bitbashing.io/async-rust.html, "Async Rust Is A Bad Language"

sunshowers

3 months ago

1 reply

As the author of the talk/blog post, I was definitely going for a bit of a moral valence in the title, in the sense that future cancellations are very hard to reason about and what I call the least Rusty part of Rust. But it admittedly is a bit clickbaity too.

acedTrex

3 months ago

1 reply

I initially skipped reading it because i thought it was another drama post about maintainers a la all the nixos stuff lately.

bigiain

3 months ago

1 reply

To balance the universe a bit, I read it expecting a drama post - then read it right through because it was at least as interesting as the drama post I'd expected. I also discovered Oxide through this, which looks interesting, except for the complete lack of pricing I can find on their site - which puts in my head as probably in the "If you need to ask the price you can't afford it" category...

jgord

3 months ago

I think Oxide should be renting out time on their hardware racks, as well as selling them to big orgs.

Oxide looks to be superb engineering up and down the whole stack, and if it drives more rust code into linux all the better.

Now that linode has been consumed by Akamai, we need an alternative.

binary132

3 months ago

2 replies

if only

zackmorris

3 months ago

9 replies

+1 this.

IMHO async is an anti-pattern, and probably the final straw that will prevent me from ever finishing learning Rust. Once one learns pass-by-value and copy-on-write semantics (Clojure, PHP arrays), the world starts looking like a spreadsheet instead of spaghetti code. I feel that a Rust-like language could be built with no borrow checker, simply by allocating twice the memory. Since that gets ever-less expensive, I'm just not willing to die on the hill of efficiency anymore. I predict that someday Rust will be relegated to porting scripting languages to a bare-metal runtime, but will not be recommended for new work.

That said, I think that Rust would make a great teaching tool in an academic setting, as the epitome of imperative languages. Maybe something great will come of it, like Swift from Objective-C or Kotlin from Java. And having grown up on C++, I have a soft spot in my heart for solving the hard problems in the fastest way possible. Maybe a voxel game in Rust, I dunno.

sunshowers

3 months ago

1 reply

Author here -- I'd recommend reading my blog post about how cargo-nextest uses Tokio + async Rust to handle very complex state machines: https://sunshowers.io/posts/nextest-and-tokio/

zackmorris

3 months ago

1 reply

Ah cool, a couple of kudos to you:

1) I learned about pin in Rust to prevent values from moving in memory.

2) I learned about the html <summary> tag (the turndown arrows in your article that work with Javascript disabled) hah.

I can see how dealing with stream and resource cleanup in async code could be a chore. It sounds like you were able to do that in a fairly declarative manner, which is what I always strive for as well.

I think my hesitation with async is that I already went down that road early in my programming life with cooperative threads/multitasking on Mac OS 9 and earlier. There always seems to be yet another brittle edge case to deal with, so it can feel infuriating playing whack-a-mole until they're all nailed down.

For example, pinning memory looks a lot like locking handles in Mac OS. Handles were pointers to pointers, so it was a bare hands way to implement a memory defragmenter before runtimes were smart enough to handle it. If apps used handles, then blocks of data could be unlocked, moved somewhere else in memory, and then re-locked. Code had to do an extra hop through each handle to get to the original pointer, which was a frequent source of bugs because one async process might be working on a block, yield, and then have another async process move the handle out from under it.

The lock's state was stored in a flag in the memory manager, basically a small bit of metadata. I haven't investigated, but I suspect that Rust may be able to handle locking more efficiently, perhaps more like reference counting or the borrow checker where it can infer whether a pointer is locked without storing that flag somewhere (but I could be wrong).

Apple abandoned handles when it migrated to OS 10 and Darwin inherited protected memory and better virtual memory from FreeBSD. Although now that I write this out, I'm not sure that they solved in-process fragmentation. I think they just give apps the full 32 or 64 bit address space so that effectively there is always another region available for the next allocation, and let the virtual memory subsystem consolidate 4k memory blocks into contiguous strips internally. The dereferencing of memory step became implicit rather than explicit, as well as hidden from apps, so that whole classes of bugs became unreachable.

Anyway, that's why I prefer the runtime to handle more of this. I want strong guarantees that I can terminate a process and all locks inside it will get freed as well. I can pretty much rely on that even in hacky languages like PHP.

My frustration with all of this is that we could/should have demanded better runtimes. We could have had realtime unixes where task switching and memory allocation were effectively free. Unfortunately the powers that be (Mac OS and Windows) had runtimes that were too entrenched with too many users relying on quirks and so they dragged their feet and never did better. Languages like Rust were forced to get very clever and go to the ends of the earth to work around that. Then when companies like Google and Facebook won the internet lottery, they pulled the ladder up behind them by unilaterally handing down decrees from on high that developers should use bare hands techniques, rather than putting real resources into reforming the fundamentals so that we wouldn't have to.

What I'm trying to say is that your solution is clever and solves a common pattern in about the simplest way possible, but is not as simple as synchronous-blocking unix pipes to child processes in shell scripts. That's in no way a criticism. I have similar feelings about stuff like Docker and Kubernetes after reading about Podman. If we could magically go back and see the initial assumptions that led us down the road we're on, we might have tried different approaches. It's all of those roads not taken that haunt me, because they represent so much of my workload each day.

sunshowers

3 months ago

Thanks for the kind words.

It is not as simple as synchronous pipes, but it also has far better edge case and error handling.

For example, on Unix, if you press ctrl-Z to pause execution, nextest will send SIGTSTP to test processes and also pause its internal timers (resuming them when you type in fg or bg). That kind of bookkeeping is pretty hard to do with linear code, and especially hard to coordinate across subprocesses.

State machines with message passing (as seen in GUI apps) are very helpful at handling this, but they're quite hard to write by hand.

The async keyword in Rust allows you to write state machines that look somewhat like linear code (though with the big cancellation asterisk).

wongarsu

3 months ago

1 reply

The rust ecosystem is very invested into making every library that touches the network async. But if the program you are writing doesn't touch the network you don't have to think about async. Or you can banish network code onto one thread with an async runtime, and communicate via flume queues/channels with it from normal threaded code running in another thread

bigstrat2003

3 months ago

1 reply

> The rust ecosystem is very invested into making every library that touches the network async.

Right, and that is one of the absolute worst things about the Rust ecosystem. Most programs don't benefit from async, and should use plain old threads because they are much easier to work with.

sunshowers

3 months ago

There is a very reasonable argument that an entire language feature shouldn't be oriented towards making high-complexity state machines easy to write, since they're relatively rare in production. But speakingly purely selfishly, I'm happy I can write something like cargo-nextest using async Rust in a bug-free manner.

xmodem

3 months ago

3 replies

In your view, which languages / ecosystems have a better general approach for handling task cancellations than async rust?

zackmorris

3 months ago

1 reply

Well, synchronous blocking approaches (as opposed to asynchronous nonblocking) provide that stuff for free. It would basically be the functional programming and unix ecosystems. Arguably the Go language's goroutines strike a good balance between cooperate and preemptive threads/multitasking. Although that distinction is not fundamental, because if we had realtime unix runtimes, then spawning an isolated process would have no more overhead than spawning a thread (this is the towering failure of all mainstream OSs today IMHO):

https://kushallabs.com/understanding-concurrency-in-go-green...

So lots of concepts are worth learning like atomicity, ACID compliance, write ahead logs (WALs), statically detecting livelocks and deadlocks (or making them unreachable), consensus algorithms like Raft and Paxos, state transfer algorithms like software transaction memory (STM), connectionless state transfer like hash trees and Merkle trees, etc.

The key insight is that manual management of tasks is, for the most part, not tenable by humans. It's better to take a step back and work at a higher level of abstraction. For example, declarative programming works in terms of goals/specifications/tests, so that the runner has more freedom to cancel and restart/retry tasks arbitrarily. That way the user can fire off a workload and wait until all of the tasks match a success criteria, and even treat that process as idempotent so it can all be run again without harm. In this way, trees of success criteria can be composed to manage a task pool.

I'd probably point to CockroachDB as one of the best task-cancellers, since it doesn't have a shutdown procedure. Its process can simply be terminated by the user with control-c, then it reconciles any outstanding transactions the next time it's booted, which just adds some latency. If an entire database can do that, then "this is the way".

xmodem

3 months ago

> Well, synchronous blocking approaches (as opposed to asynchronous nonblocking) provide that stuff for free.

Not really. The talk describes problems that can show up in any environment where you have concurrency and cancellation. To adapt some examples: a thread that consumes a message from a channel but is killed before it can process it, has still resulted in that message being lost. A synchronous task that needs to temporarily violate invariants in some data structure that can't be updated atomically, has still left that data structure in an invalid state when it gets killed part way through.

> Arguably the Go language's goroutines strike a good balance between cooperate and preemptive threads/multitasking.

Goroutines are pretty nice. It's especially nice that Go has avoided the function colouring problem. I'm not convinced that having to litter your code with select's if you need to make your goroutine's cancel-able is good though. And if you don't care about being able to cancel tasks, you can write async rust in a way that ensures they won't be cancelled by accident fairly easily. Unless there's some better way to write cancel-able goroutines that I'm not familiar with.

> The key insight is that manual management of tasks is, for the most part, not tenable by humans. It's better to take a step back and work at a higher level of abstraction.

Of course it's always important to look at systems as a whole. But to build larger systems out of smaller components you need to actually build the small components.

> I'd probably point to CockroachDB as one of the best task-cancellers, since it doesn't have a shutdown procedure. Its process can simply be terminated by the user with control-c, then it reconciles any outstanding transactions the next time it's booted, which just adds some latency. If an entire database can do that, then "this is the way".

I'm not familiar with CockroachDB specifically, but I do think a database should generally have a more involved happy-path shutdown procedure than that. In particular, I would like the database not to begin processing new transactions if it is not going to be able to finish them before it needs to shut down, even if not finishing them wouldn't violate ACID or any of my invariants.

neillyons

3 months ago

Elixir via the Task module https://hexdocs.pm/elixir/Task.html

GhosT078

3 months ago

Ada has very well thought out and proven tasking features, including clean methods of task cancellation.

vlovich123

3 months ago

2 replies

> Since that gets ever-less expensive,

That kind of thinking made sense in the 90s when things followed Moore’s law. But DRAM was one of the first things to fail to keep up: https://ourworldindata.org/grapher/historical-cost-of-comput... and barely gets cheaper anymore. Thats why mobile phones still only have 16gb of memory despite having 4gib a decade ago.

And there’s all sorts of problems that Rust doesn’t necessarily make a great fit for. But Rust’s target marketplace is where you’d otherwise use a low level language like C or C++. If you can just heap allocate everything and aggressively create copies all over the place, then why would you ever use those languages in the first place.

And for what it’s worth Rust is finding a lot of success even replacing all the tooling in other language ecosystems like Ruby, Python, and JS precisely because the tools in those ecosystems written in the native language end up being horribly slow. And memory allocation and randomly deep copying arrays are the kinds of things that add up and make things slow (in addition to GC pauses, slow startups, interpreter costs etc).

And you can always choose not to do async in Rust although personally I’m a huge fan as it makes it really clear where you have sprinkled in I/O in places you shouldn’t have.

koito17

3 months ago

2 replies

Before adopting Rust, I also found it silly for high-level tasks where e.g. Clojure or Java would suffice. However, the results of using Rust changed my mind.

I used to write web backends in Clojure, and justified it with the fact that the JVM has some of the best profiling tools available (I still believe this), and the JVM itself exposes lots of knobs to not only fine-tune the GC, but even choose a GC! (This cannot be understated; garbage collectors tend to be deeply integrated into a language's runtime, and it's amazing to me that the Java platform manages to ship several garbage collectors, each of which are optimal in their own specific situations).

After rewriting an NLP-heavy web app in Rust, I saw massive performance gains over the original Clojure version, even though both aggressively copy data and the Rust version is full of atomic refcounts (atomic refcounting is not the fastest GC out there...)

The binary emitted by rustc is also much smaller. ~10 MB static binary vs. GraalVM's ~80 MB native images (and longer build times, since classpath analysis and reflection scanning require a lot of work)

What surprised me the most is how high-level Rust feels in practice. I can use pattern matching, async/await, functional programming idioms, etc., and it ends up being fast anyway. Coming from Clojure, Rust syntax trying its best to be expression-oriented is a key differentiator from other languages in its target domain (notably, C++). I sometimes miss TypeScript's anonymous enums, but Rust's type system can express a lot of of runtime behavior, and it's partly why many jokingly state "if it compiles, it's likely correct". Then there's the little things, like how Rust's Futures don't immediately start in the background. In contrast, JavaScript Promises are immediately pushed to a microtask queue, so cancelling a Promise is impossible by design.

Overall, it's the little things like this -- and the toolchain (cargo, clippy, rustfmt) -- that have kept me using Rust. I can write high-level code and still compile down to a ~5 MB binary and outperform idiomatic code in other languages I'm familiar with (e.g. Clojure, Java, and TypeScript).

binary132

3 months ago

I also like Clojure, but I have to wonder how that would have compared in Java, which I think is more performant.

sunshowers

3 months ago

Speaking personally, that is what first attracted me to Rust — that you can write high-level idiomatic code and still get roughly optimal performance.

rafram

3 months ago

1 reply

It isn’t as dramatic a decrease as other types of storage, but $4,000 to $1,000 per terabyte in a decade is still a big drop.

vlovich123

3 months ago

Not big enough to hand wave away being careless with RAM. That worked for CPU cycles until ~2010 but the failure to continue scaling traditional computing paradigms exponentially is a huge reason why good performance engineering is still really important for large scale tasks.

airstrike

3 months ago

1 reply

This reads hella uninformed

zackmorris

3 months ago

1 reply

You're right, but not in the usual way hah. I started programming in the late 1980s with HyperCard, then used mostly C++ in the 90s, and have seen the rise and fall of various paradigms that felt eternal. I mean at one time, Java felt untouchable.

I think that Rust is making an admiral attempt to attack challenges that have already been solved better in other ways. I just don't have much use for its arsenal.

For example, I wasted 2 years of my life trying to write a NAT-punching peer to peer networking framework for games around 2005, but was first exposed to synchronous blocking vs asynchronous nonblocking networking in the late 90s when I read Beej's Guide to Network Programming:

https://beej.us/guide/bgnet/

I was hopelessly trying to mimic the functionality of libraries like RakNet and Zoidcom without knowing some fundamentals that I wouldn't fully understand for years:

https://www.reddit.com/r/gamedev/comments/93kr9h/recommended...

20 years later, Rust has iroh:

https://github.com/n0-computer/iroh

I realize there is some irony in pointing to a Rust library as a final solution.

But my point is that when developers reached high levels of financial success and power, they didn't go back to address the fundamentals. NAT was always an abomination to me. And as far as I know, they kept it in IPv6. Someone like Google should have provided a way to get around it that's not as heavy as WebRTC. So many developer years of work have been wasted due to the mistakes of the status quo. So that we wander in the desert for years using lackluster paradigms because we don't know that better stuff exists.

Knowing what I know now, I would have created open source C (portable) libraries to solve NAT punching, state transfer with a software transactional memory (STM) or Raft, entity state machines (like in Unity), movement prediction/dead reckoning, etc etc etc to form the basis of a distributed computing network for virtual worlds and let the developer community solve that. Someone will do that in a year or two with AI now I assume.

Ok you kinda got me. I realize after writing this out that I wouldn't use Rust for new work, but it's not so much about the language itself as building upon proven layers to "get real work done". The lower the level of abstraction, the harder that is to do. So it's hard for me to see the problem which Rust is trying to solve.

airstrike

3 months ago

I appreciate the reply. My short response would be that Rust is trying to solve multiple problems, not just one. Memory safety is the best known, but it's not specifically The One Reason to Rule Them All.

I'm a big fan of the type system and how expressive I feel with Rust. The compiler is incredibly helpful too. rust-analyzer is a superpower. Just yesterday I embarked on a pretty big refactor and all it took was changing a couple of types—and then fixing the 500 problems vscode was pointing out.

Being able to jump in at the deep end like this in a ~90kloc codebase is only feasible (to me) because I know the tooling has my back.

It's not the perfect tool for every project. But it's a really great choice for a really large number of projects. I encourage to try it a little more on a variety of domains to see if it clicks

63stack

3 months ago

2 replies

By allocating twice the memory of ...?

SV_BubbleTime

3 months ago

Everything, everywhere, all the time! It’s so simple, why has no one ever thought of just increasing a finite resource!?

zackmorris

3 months ago

Sorry, I should have elaborated. I believe that copy-on-write with virtual memory (VM) can be used to achieve a runtime that appears to use copy-by-value everywhere with near-zero overhead when the VM block size is small, like 4k.

If we imagine a function passing a block of memory to sub functions which may write bytes to it randomly, then each of those writes may allocate another block. If those allocations are similar in size to the VM block size, then each invocation can potentially double the amount of memory used.

A do-one-thing-and-do-it-well (DOTADIW?) program works in a one-shot fashion where the main process fires off child processes that return and free the memory that was passed by value. Surrounded by pipes, so that data is transmuted by each process and sent to the next one. VM usage may grow large temporarily per-process, but overall we can think of each concurrent process as roughly doubling the amount of memory.

Writing this out, I realized that the worst case might be more like every byte changing in a 4k block, so a 4096 times increase in memory. Which still might be reasonable, since we accept roughly a 200x speed decrease for scripting languages. It might be worth profiling PHP to see how much memory increases when every byte in a passed array is modified. Maybe they use a clever tree or refcount strategy to reduce the amount of storage needed when arrays are modified. Or maybe they just copy the entire array?

Another avenue of research might be determining whether a smarter runtime could work with "virtual" VMs (VVMs?) to use a really small block size, maybe 4 or 8 bytes to match the memory bus. I'd be willing to live with a 4x or 8x increase in memory to avoid borrow checkers, refcounts or garbage collection.

Edit: after all these years, I finally looked up how PHP handles copy-on-write, and it does copy the whole array on write unfortunately:

http://hengrui-li.blogspot.com/2011/08/php-copy-on-write-how...

If I were to write something like this today, I'd maybe use "smart" associative arrays of some kind instead of contiguous arrays, so that only the modified section would get copied. Internally that might be a B-Tree with perhaps 8 bytes per leaf to hold N primitives like 1 double, 2 floats, etc. In practice, a larger size like 16-256 bytes per leaf might improve performance at the cost of memory.

Looks like ZFS deduplication only copies the blocks within the file that changed, not the entire file. Their strategy could be used for a VM so that copy-on-write between processes only copies the 4k blocks that change. Then if it was a realtime unix, functions could be synchronous blocking processes that could be called with little or no overhead.

This is the level of work that would be required to replace Rust with simpler metaphors, and why it hasn't happened yet.

dlahoda

3 months ago

lean4.

it analyses code. if it finds raii/linearity/single-ownership, it does exactly like rust mem mgmt.

but if it js not, it does rc.

so it does what rust, but automagically without polluting code.

so cow or pbw or 2mem are not only options to improve rust.

sertsa

3 months ago

There is a voxel game in Rust, btw: https://veloren.net/

Yoric

3 months ago

> I feel that a Rust-like language could be built with no borrow checker, simply by allocating twice the memory.

If that's what you're looking for, have you considered OCaml?

moggers123

3 months ago

I just don't really know where the ecosystem is going with async these days. I see a lot of changes in the language, many of which seem more complex than are typical for the justification, some of which have broader utility but generally wouldn't be done if it weren't for them being necessary for async... A hydra of complexity and honestly, where does it end? When will async be "solved"? What will the language look like when it is? Is it really all justified? Did we know that the road would be this long when we started it? For me, you could resolve this complexity by just letting me flip a switch and have things operate in a really basic blocking mode with no state machine or runtime and I'll sling this stuff on a thread if I have to, the way I used to. My use cases never needed this. I can get by just fine solving the circumstances I would need something like this with a purpose build mechanism, not a language native feature trying to solve all the different flavors of asynchronous problems I have + a million I don't.

of course... Its obviously not as simple as "just give me a way to turn it off", but more importantly, I just don't see this concern being addressed by the Powers That Be. Am I just not looking hard enough? Did I miss the rust blog post titled "hey - so you didn't want to use async but the libraries that you did want to use ship with async so you're up shit creek.. Here's what our plan for that is"?

I'm sorry. I generally lurk because I don't consider myself up to the caliber of others on this website, but nonetheless the few posts I make do end up being about async because it does make me feel quite hopeless at times. Hopefully someone can look passed my ignorance/incompetence/selfishness/immaturity and tell me its all going to be okay.

benatkin

3 months ago

It made sense to me, because I imagine a thread or coroutine as something that runs code as though it were interpreting something like psuedocode, whether it's doing that or not. So from my point of view an instance of async Rust is being cancelled - not the feature of the Rust project, but instances of code.

This abstraction has served me well and facilitates stepping through code in a debugger, though I jump out of thinking it at that level when I need to think of it at a lower level.

bryanlarsen

3 months ago

1 reply

Timely! Was grumbling about this today as I added a "this function is cancel safe" to a new function's doc comment.

I really hope we get async drop soon.

0x1ceb00da

3 months ago

1 reply

I'm curious. Can you talk a little about that function?

bryanlarsen

3 months ago

Most common scenario in article: select!. I split out a "wait for X to be ready" from "X" so that the former could be on the left side of a select ARM, and the rest on the right side.

Panzerschrek

3 months ago

2 replies

One should always keep in mind that await is always a potential return point. So, using await between two actions which always should be performed together should be avoided.

Spivak

3 months ago

3 replies

That… seems bad? Like I guess it is what it is and you just have to deal with it but what if your "critical section" has two await calls? The code can be paused between them but it's such that it must eventually resume. Say making a change in the database and emitting an audit edit for that change. Is your only option to either not do that or put a big do not cancel sign on the function docs?

diarrhea

3 months ago

1 reply

A DB action and audit emission have to run transactionally anyway.

So on cancellation, the transaction times out and nothing is written. Bad but safe.

The problem is the same on other platforms. For example, what if writing to the DB throws an exception if you’re on Python? Your app just dies, the transaction times out. Unfortunate but safe.

If it does not run transactionally you have a problem in any execution scenario.

sunshowers

3 months ago

So, regarding transactions, absolutely you can throw them away on cancellation. But there's an interesting wrinkle here: if you use a connection pool like most users, and you were going to do the ROLLBACK at the end of your future on error, then that ROLLBACK wouldn't run it the future is cancelled! Then future operations reusing the same connection would be stuck in transaction la-la land.

(This is related to the fact that Rust doesn't have async drop — you can't run async code on drop, other than spawning a new task to do the cleanup.)

This is prong 3 of my cancel correctness framework (that the cancellation violates a system property, in this case a cleanup property.) The solution here is to ensure the connection is in a pristine state before handing it out the next time it's used.

sunshowers

3 months ago

In general I think people end up gravitating towards using message passing or the actor model for this.

setr

3 months ago

Even if you guaranteed the calling code would always logically continue running the function till completion, you wouldn’t have the guarantee the code would actually resume — eg the program crashes between the two calls, network dies, etc.

If you want to tie multiple actions together as an atomic unit, you need the other side to have some concept of transactions; — and you need to utilize it.

cogman10

3 months ago

1 reply

Wait, how does this work in practice?

Let's say my code looks like this

    async fn a() {
      b().await
    }

    async fn b() {
      c().await
      d().await
    }

    async fn c() {
    }

    async fn d() {
    }

Where does an issue occur which causes `d` to not to be called? Is it some sort of cancellation in c? Or some upstream action in a?

cogman10

3 months ago

1 reply

Ah, I see it now in the article. I just missed it.

`d` not being called would happen because of actions in `a`.

If `a` were rewritten as

    async fn a() {
      try_join!(b(), c(), d())
    }

Then if `c` ends up failing in the try_join then process on `b` will be halted and thus the `d` in `b` won't be executed.

alfiedotwtf

3 months ago

1 reply

Maybe I’m thick, but I’m not seeing what is the problem in your first codeblock?

cogman10

3 months ago

There's nothing wrong in my first comment, it's the second that clarifies adding a `try_join` at the top of the stack can break things below (which is what I was trying to figure out in my initial comment).

Because rust is ultimately constructing a state machine which is ran by the caller, the execution of that state machine can be interrupted or partially executed at any of the `await` points. Or more accurately the caller can simply not advance the state machine.

So, the `try_join` macro can start work on the various functions and if any of them fail, the others are ultimately cancelled. Which can happen before those functions finish fully executing.

This is particularly bad if there's a partial state change.

I'm not entirely sure what that means for memory allocation.

CodeBrad

3 months ago

2 replies

This was one of my favorite talks from RustConf this year! The distinction between cancel safety and cancel correctness is really helpful.

Glad to see it converted to a blog post. Talks are great, but blogs are much easier to share and reference.

sunshowers

3 months ago

Thanks! I definitely prefer reading blog posts over watching talks as well.

pornel

3 months ago

"Cancel correctness" makes a lot of sense, because it puts the cancellation in some context.

I don't like the "cancel safety" term. Not only it's unrelated to the Rust's concept of safety, it's also unnecessarily judgemental.

Safe/unsafe implies there's a better or worse behavior, but what is desirable for cancellation to do is highly context-dependent.

Futures awaiting spawned tasks are called "cancellation safe", because they won't stop the task when dropped. But that's not an inherently safe behavior – leaving tasks running after their spawner has been cancelled could be a bug: piling up work that won't be used, and even interfering with the rest of the program by keeping locks locked or ports used. OTOH a spawn handle that stops the task when dropped would be called "cancellation unsafe", despite being a very useful construct specifically for propagating cleanup to dependent tasks.

ossopite

3 months ago

2 replies

I think the send/recv with a timeout example is very interesting, because in a language where futures start running immediately without being polled, I think the situation is likely to be the opposite way around. send with a timeout is probably safe (you may still send if the timeout happened, which you might be sad about, but the message isn't lost), while recv with a timeout is probably unsafe, because you might read the message out of the channel but then discard it because you selected the timeout completion instead. And the fix is similar, you want to select either the timeout or 'something is available' from the channel, and if you select the latter you can peek to get the available data.

lionkor

3 months ago

Isn't this exactly what cancellation-safety is all about?

sunshowers

3 months ago

Thanks, that is a great point.

Animats

3 months ago

3 replies

In the initial example, it's not clear what the desired behavior is. If the queue is full, the basic options are drop something, block and wait, or panic. Timing out on a block is usually deadlock detection. He writes "It turns out that this code is often incorrect, because not all messages make their way to the channel." Well, yes. You're out of resources. Now what?

What's he trying to do? Get a clean program shutdown? That's moderately difficult in threaded programs, and async has problems, too. The use case here is unclear.

The real use cases involve when you're sending messages back and forth to a remote site, and the remote site goes away. Now you need to dispose of the state on your end.

sunshowers

3 months ago

2 replies

Ideally, what you would like to do is buffer up the message until there's space in the channel. I cover this later in the talk under "What can be done".

Animats

3 months ago

1 reply

The double-loop thing effectively creates a blocking operation. Something you can do directly. Why all the complexity?

sunshowers

3 months ago

Agreed that in the narrow case of a timeout it doesn't buy you much (and things like network sockets often let you do timeouts in synchronous code). But often you do want the power to do selects and more complex state machines. I wrote a blog post a couple years ago talking about why a project I'm the author of, cargo-nextest, switched from sync Rust to async. https://sunshowers.io/posts/nextest-and-tokio/

To this day I'm not aware of a better way to express what's become a set of increasingly complex state machines (the most recent improvement being to make the state machines responsive to user input). Nextest's runner loop is structured mostly like a GUI event loop, but without explicit state machines. It's quite nice being able to write code that's this complex in a bug-free manner.

ajross

3 months ago

1 reply

Is that ideal, though? I mean, the channel is the buffer. If you need more buffer, it should have been bigger to start with. Generally this reflects a resource exhaustion failure, which you don't handle by adding code. Fix the resource allocation.

sunshowers

3 months ago

It depends on how tolerant you are to losing messages under backpressure. In some cases at work we set a large channel size, and then panic if it's exceeded.

leoedin

3 months ago

1 reply

It's in the example isn't it? The example is logging "No space for 5 seconds". It's just a helpful diagnostic that subtly turned into data loss.

Maybe it's a bit contrived, but it's also the kind of code you'd sprinkle through your system in response to "nothing seems to be happening and I don't know why".

sunshowers

3 months ago

It's definitely a bit contrived, but to me it's also emblematic of the issues with async Rust.

The note on mpsc::Sender::send losing the message on drop [1] was actually added by me [2], after I wrote the Oxide RFD on cancellations [3] that this talk is a distilled form of. So even the great folks on the Tokio project hadn't documented this particular landmine.

[1] https://docs.rs/tokio/latest/tokio/sync/mpsc/struct.Sender.h...

[2] https://github.com/tokio-rs/tokio/pull/5947

[3] https://rfd.shared.oxide.computer/rfd/0400

AceJohnny2

3 months ago

> He writes

They go by they/she

https://sunshowers.io/about/

arifalkner

3 months ago

1 reply

Great talk! One thing that would have been nice to call out for n00bs like myself is how in SOP Futures can't cancelled. I knew that .await took ownership of the future so that drop() could not be called on it, so given how futures are lazy, it wasn't clear to me how to cancel a future after .await had been called. I later researched how select! and Abortable() did this, but could be nice to include a callout in the beginning of your talk if you ever do it again. Otherwise, nice work!

sunshowers

3 months ago

Thanks! What does SOP mean in this context?

alembic_fumes

3 months ago

2 replies

I'm not understanding what the supposed problem with these futures getting cancelled is. Since futures are not tasks, as the post itself acknowledges, does it not logically follow that one should not expect futures to complete if the future is not driven to completion, for one reason or another? What else could even be expected to happen?

The examples presented for "cancel unsafe" futures seem to me like the root of the problem is some sort of misalignment of expectations to the reality:

Example 1: one future cancelled on error in the other

let res = tokio::try_join!( do_stuff_async(), more_async_work(), );

Example 2: data not written out on cancellation

let buffer: &[u8] = /* ... */; writer.write_all(buffer)?;

Both of these cases are claimed to not be cancel-safe, because the work gets interrupted and so not driven to completion. But again, what else is supposed to happen? If you want the work to finish regardless of the async context being cancelled, then don't put it in the same async context but spawn a task instead.

I feel like I must be missing something obvious that keeps me from understanding the author's issue here. I thought work getting dropped on cancellation is exactly how futures are supposed to work. What's the nuance that I'm missing?

sunshowers

3 months ago

1 reply

You're absolutely right! The problem is that this has introduced many bugs in our experience at Oxide. If you've already fully internalized the idea that futures are passive and can be cancelled at any await point, the talk is just a bunch of details.

alembic_fumes

3 months ago

1 reply

I see. Do you suppose that the origin of these bugs is more about the difficulty of reasoning about the execution of deep async stacks, or does it come down to the developers holding an incorrect mental model of the Rust futures in their minds?

I am asking because I've noticed that many developers with previous experience from "task-based" languages (specifically the JS/TS world) tend to grasp the basics of Rust async quickly enough, but then run into expectation-misalignment related problems similar to the examples that you used in your post. That in turn has made want to understand whether it is the Rust futures that are themselves difficult or strange, or whether it's a case of the Rust futures appearing simple and familiar, even though they are completely different in very subtle ways. I suppose that it's a combination of both.

sunshowers

3 months ago

Yeah, it's a combination of both in my experience. I think even to experienced async Rust programmers, things like Tokio mutexes being really hard to use correctly can be a bit surprising.

Also, as another comment on the thread points out [1], languages where futures are active by default can have the opposite problem.

[1] https://news.ycombinator.com/item?id=45467188

LelouBil

3 months ago

I don't write Rust often, and I think part of the issue is that the async/await style is encouraging you to write code that looks a lot like synchronous code, and so it makes it really easy to forget that your code is cancelable at any await point.

I'm sure experienced async Rust programmers always have this things in mind, but Rust is also about preventing these kinds of missable behaviour, be it via the type system or otherwise.

foota

3 months ago

1 reply

https://sunshowers.io/posts/cancelling-async-rust/#the-pain-... was the most interesting part of this for me, as I can totally see making mistakes like this.

schmichael

3 months ago

1 reply

I'm a Go developer and this was still useful for me! Obviously Rust devs are more accustomed to more assistance from their tools than Go devs, but just about every gotcha listed is something that can happen in Go with goroutines, channels, select, and other shared concurrency primitives.

jcgrillo

3 months ago

!m schmichael

nofriend

3 months ago

1 reply

doesn't rust have raii?

sunshowers

3 months ago

1 reply

It does, but you can only run synchronous code on drop. This is what "async drop" is supposed to handle — things like issuing ROLLBACK statements to the database on cancellation.

It also wouldn't help when you have no valid state to restore to, as in the mutex example in the post.

0x1ceb00da

3 months ago

1 reply

tokio-postgres handles this by just dispatching the "ROLLBACK" command in impl Drop and ignoring the response. https://github.com/rust-postgres/rust-postgres/blob/a7a49a90...

Is this not enough? What could go wrong? If the network connection dies or the task is cancelled, I'm assuming the database server cleans up the connection state and does a rollback automatically.

And adding async Drop will probably add a whole new set of footguns.

_davide_

3 months ago

1 reply

> What could go wrong?

LoL, an insane amount of things. TCP connections are an illusion of safely, for the purpose of database commits use UDP packets as a model instead, it'll be much closer to reality.

0x1ceb00da

3 months ago

> an insane amount of things

List a couple

> TCP connections are an illusion of safely

Why?

Matthias247

3 months ago

Some other material that has been written by me on that topic:

- Proposal from 2020 about async functions which are forced to run to completion (and thereby would use graceful cancellation if necessary). Quite old, but I still feel that no better idea has come up so far. https://github.com/Matthias247/rfcs/pull/1

- Proposal for unified cancellation between sync and async Rust ("A case for CancellationTokens" - https://gist.github.com/Matthias247/354941ebcc4d2270d07ff0c6...)

- Exploration of an implementation of the above: https://github.com/Matthias247/min_cancel_token

dxxvi

3 months ago

My Rust knowledge is too low to understand. However, thank you very much for the article. I hope AI will learn it and explain to me in a few years :-)

nijaar

3 months ago

for a sec i thought DEI was going too far

is the title like that on purpose?

beanjuiceII

3 months ago

i am honestly glad i don't write rust anymore

mleonhard

3 months ago

I think that async in Rust has a significant devex/velocity cost. Unfortunately, nearly all of the effort in Rust libraries has gone into async code, so the async libraries have outpaced the threaded libraries.

There was only one threaded web server, https://lib.rs/crates/rouille . It has 1.1M lines of code (including deps). Its hello-world example reaches only 26Krps on my machine (Apple M4 Pro). It also has a bug that makes it problematic to use in production: https://github.com/tiny-http/tiny-http/issues/221 .

I wrote https://lib.rs/crates/servlin threaded web server. It uses async internally. It has 221K lines of code. Its hello-world example reaches 102Krps on my machine.

https://lib.rs/crates/ehttpd is another one but it has no tests and it seems abandoned. It does an impressive 113Krps without async, using only 8K lines of code.

For comparison, the popular Axum async web server has 4.3M lines of code and its hello-world example reaches 190Krps on my machine.

The popular threaded Postgres client uses Tokio internally and has 1M lines of code: http://lib.rs/postgres .

Recently a threaded Postgres client was released. It has 500K lines of code: https://lib.rs/crates/postgres_sync .

There was no ergonomic way to signal cancellation to threads, so I wrote one: https://crates.io/crates/permit .

Rust's threaded libraries are starting to catch up to the async libraries!

---

I measured lines of code with `rm -rf deps.filtered && cargo vendor-filterer --platform=aarch64-apple-darwin --exclude-crate-path='*#tests' deps.filtered && tokei deps.filtered`.

I ran web servers with `cargo run --release --example hello-world` and measured throughput with `rewrk -c 1000 -d 10s -h http://127.0.0.1:3000/`.

tison

3 months ago

Rust's Future is somehow like move semantics in C++, where you may leave a Future in an invalid state after it finishes. Besides, Rust adopts a stackless coroutine design, so you need to maintain the state in your struct if you would like to implement a poll-based async structure manually.

These are all common traps. And now cancellations in async Rust are a new complement to state management in async Rust (Futures).

When I'm developing the mea (Make Easy Async) [1] library, I document the cancel safety attribute when it's non-trivial.

Additionally, I recall [2] an instance where a thoughtless async cancellation can disrupt the IO stack.

[1] https://github.com/fast/mea

[2] https://www.reddit.com/r/rust/comments/1gfi5r1/comment/luido...

View full discussion on Hacker News

ID: 45464632Type: storyLast synced: 11/20/2025, 4:35:27 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN