A Few Words About Async
Posted2 months agoActiveabout 2 months ago
yoric.github.ioTechstory
calmmixed
Debate
60/100
Async/awaitConcurrencyProgramming Languages
Key topics
Async/await
Concurrency
Programming Languages
The article provides an in-depth overview of async/await and its complexities, sparking a discussion on its performance, use cases, and comparisons with other concurrency models.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
30
Day 1
Avg / period
7.4
Comment distribution37 data points
Loading chart...
Based on 37 loaded comments
Key moments
- 01Story posted
Nov 1, 2025 at 9:10 PM EDT
2 months ago
Step 01 - 02First comment
Nov 1, 2025 at 11:09 PM EDT
2h after posting
Step 02 - 03Peak activity
30 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 10, 2025 at 12:40 PM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45787036Type: storyLast synced: 11/20/2025, 5:11:42 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
> A second drawback is that async/await has a performance cost. CPU-bound code written with async/await will simply never be as fast or as memory-efficient as the equivalent synchronous code.
If you are interested, .NET is actively improving at this and .NET 11 will ship with "Runtime Async" which replaces explicitly generated state machines with runtime suspension mechanism. It's not """zero-cost""" for now (for example it can block object escape analysis), and the async calling convention is different to sync, but the cost is massively reduced, the calls can be inlined, optimized away, devirtualized and more in the same way standard sync calls can. There will be few drawbacks to using async at that point, save for the syntax noise and poor default habit in .NET to append Async suffix to such methods. In your own code you can write it tersely however.
As for Rust, it also can optimize it quite well, the "call-level overhead" is much less of a problem there, although I have not studied compiler output for async Rust in detail so hopefully someone with more familiarity can weight in.
In many cases the lib will rely on threads to handle calls to synchronous functions, got me wondering if there's a valid use case for running multiple async threads on a single core.
Eg in a user application you might have the performance sensitive work (eg rendering) which needs to be highly parallel - give it a bunch of threads. However when drawing the UI, handing user input, etc you usually don't need high throughput - use only 1 thread to minimise the impact on the rendering threads
In my work with server side code, I use multiple async runtimes. One runtime is multithreaded and handles all the real traffic. One runtime is singlethreaded and handles management operations such as dispatching metrics and logs or garbage collecting our caches
if your async thread is so busy that you need another one, then it's probably not an async workload to begin with.
i work on a python app which uses threads and async, but only have one async thread because it's more than enough to handle all the async work i throw at it.
Languages like rust/python that use lots of reserved keywords, especially for control flow seem to have reached for that arrow to solve the “event loop” problem as described.
In BEAM languages, that very event loop stays front and center, you don’t have this awkward entanglement between reserved keywords and event loop. If you want another chunk of thing to happen later because of an event, you just arrange for an event of that nature to be delivered. No callbacks. No async coloring. Just events. The solution to the event problem is to double down and make your event loop more generally usable.
You can spawn new processes, communicate between processes (which don't have to be on the same computer), sending any kind of data between them including closures.
BEAM also has an error model designed to handle concurrent and distributed failures, e.g. a process may fail and another process (which, again, may or may not be on the same machine) monitoring it may decide to restart it, or to do some recovery, etc.
BEAM builds into it a number of the features for which we use orchestration, observability, just simpler and (generally) more flexible. And this is a platform that has been used in the industry since the 90s.
Glommio and monoio are async runtimes in rust on top of io_uring and Tokio has an optional io_uring backend. Does that not count? This is such a well researched article that this kind of statement makes me think I’m missing something - surprising the author would get this wrong.
I didn't mention tokio's io_uring because, as far as I understand, it is unmaintained. I vaguely recall a conversation in which someone (a contributor?) was claiming that it was not possible to implement most of the features of tokio on io_uring due to conflicting models. [source needed], obviously.
I will admit the very existence of glommio or monoio had entirely slipped my mind. I'll probably need to add a few paragraphs about thread-per-core runtimes. Thanks!
The big one is this, and this will wreck a lot of pre-io_uring APIs:
Historically, you pass in a buffer to read(2). There is one buffer per pending read. (And this is a scalability limitation.)
With io_uring, you have a pool of buffers and a read completes by grabbing a buffer from the pool and putting the data there.
io_uring highest-performance API is fundamentally at odds with the historical read API that was inherited from POSIX to just about every stdlib.
https://github.com/PADL/IORingSwift
Asio supports async/await, stackful coroutines and plain old manual continuation passing.
[1] https://think-async.com/Asio/asio-1.36.0/doc/asio/history.ht...
This article looks like a real review. I only have one concern with it: It oversells M:N concurrency with green threads over async/await. If I understand correctly, it claims that async/await (as implemented by Rust, Python C# and Kotlin - not JavaScript) is less efficient (both in terms of RAM and CPU) than M:N concurrency using green threads. The main advantages it has is that No GC is required, C library calls carry no extra cost and the cost of using async functions is always explicit. This makes async/await great for a systems language like Rust, but it also pushes a hidden claim that Python, C# and Kotlin all made a mistake by choosing async/await. It's a more nuanced approach than what people take by incorrectly reading the articles I mentioned above, but I think it's still misguided. I might also be reading this incorrectly, but then I think the article is just not being clear enough about the issues of cost.
To put it shortly: Both green threads and async/await are significantly costlier than single-threaded code, but their cost manifests in different ways. With async/await the cost mostly manifests at "suspension points" (whenever you're writing "await"), which are very explicit. With green threads, the cost is spread everywhere. The CPU cost of green threads includes not only the wrapping C library calls (which is mentioned), but also the cost of resizing or segmenting the stack (since we cannot juts preallocate a 1MiB stack for each coroutine). Go started out with segmented stacks and moved on to allocating a new small stack (2KiB IIRC) for each new goroutine and copying it to a new stack every time it needs to grow[1]. That mechanism alone carries its own overhead.
The other issue that is mentioned with regards to async/await but is portrayed as "resolved" for green threads is memory efficiency, but this couldn't be farther from the truth: when it's implemented as a state machine, async/await is always more efficient than green threads. Async/await allocates memory on every suspension, but it only saves the state that needs to be saved for this suspension (as an oversimplification we can say it only saves the variables already allocated on the stack). Green threads, on the other hand, always allocate extra space on the stack, so there would always be some overhead. Don't get me wrong here: green threads with dynamic stacks are considerably cheaper than real threads and you can comfortably run hundreds of thousands of them on a single machine. But async/await state machines are even cheaper.
I also have a few other nitpicks (maybe these issues come from the languages this article focuses on, mainly Go, Python, Rust and JavaScript)
- If I understand correctly, the article claims async/await doesn't suffer from "multi-threading risks". This is mostly true in Rust, Python with GIL and JavaScript, for different reasons that have more to do with each language than async/await: JavaScript is single-threaded, Python (by default) has a GIL, and Rust doesn't let you have write non-thread-safe code even if you're using plain old threads. But that's not the case with C# or Kotlin: you still need to be careful with async/await in these languages just as you would be when writing goroutines in Go. On the other hand, if you write Lua coroutines (which are equivalent to Goroutines in Go), you can safely ignore synchronization unless you have a shared memory value that needs to be updated across suspension points.
- Most green thread implementations would block the host thread completely if you call a blocking function from a non-blocking coroutine. Go is an outlier even among the languages that employ green threads, since it supports full preemption of long-running goroutines (even if no C library code is called). But even Go only added full support for preemption with Go 1.14. I'm not quite since when long-running Cgo function calls have been preemptible, but this still shows that Go is doing its own thing here. If you have to use green threads on another language like Lua or Erlang, you shouldn't expect this behavior.
[1] https://blog.cloudflare.com/how-stacks-are-handled-in-go/
1. Thanks for your remarks on memory efficiency. I wrote that piece a few months ago, I'll have to reread it, but if I implied something wrong, I'll try and amend it!
2. Regarding "multi-threading risks", I don't think I claim that. I have definitely encountered race conditions in single-threaded async code. You don't encounter the same kind of memory corruptions as in, say, multi-threaded C, but you can definitely break invariants on data structures. If I miswrote/wrote something unclear, I'll need to fix that, too!
C# and Kotlin are safe from data races; Go is not. If you do not explicitly synchronize in C#/Kotlin you may see torn writes and other anomalies, but these will not directly impact safety unlike in Go.
Haskell (GHC) does not provide async/await but uses a green thread model.
As I understand it async-await is syntax sugar to write a state machine for cooperative multitasking. Green "threads" are threads implemented in user code that might or might not use OS threads. E.g.:
- You can use Rust tokio::task (green threads) with a manually coded Future with no async-await sugar, which might or might not be parallelized depending on the Tokio runtime it's running on.
- ...or with a Future returned by an async block, which allows async-await syntax.
- You can have a Future created by an async function call and poll it manually from an OS thread.
- Node has async-await syntax to express concurrency but it has no parallelism at all since it is single-threaded. I think no green threads either (neither parallel or not) since Promises are stackless?
Is this a new usage of the term I don't know about? What does it mean? Or did I misinterpret the "but"?
As a non-Haskeller I guess it doesn't need explicit async-await syntax because there might be some way to express the same concept with monads?
[1] https://www.cambridge.org/core/services/aop-cambridge-core/c...
Well, I haven't used Haskell in a few years, so I could absolutely be wrong. That being said, I'm almost sure that I saw a presentation by Simon Marlowe 15-20 years ago demonstrating GHC with a multicore scheduler (alongside `seq` and `par`). Also, from the very same Simon Marlowe, there's a package called `async` https://hackage.haskell.org/package/async which basically provides async (no await, though).
They only diverge when you consider multiple tasks.
Latency numbers always include queuing time - so the measures are not related or derivable from each other.
A process might have a throughput of 1 million jobs per second but if the average size of the queue is 10 million then your job latency is going to be 10 seconds on average and not 1 microsecond.
* Processes do this right out of the box. * Threads only do this on Python's new GIL builds. * Async, not so much.
That would definitely keep the story from being all-hat-and-no-cattle. I can't recall reading something with so many alternate versions of how to implement something but with zero benchmarks.
for python syntax to enumerate the fibonacci sequence:
#fibonacci(n - 1) + fibonacci(n - 2)
Which computes event.arg
1 more comments available on Hacker News