A Few Words About Async

Posted2 months agoActiveabout 2 months ago

vinhnx

89 points

38 comments

yoric.github.ioTechstory

calmmixed

Debate

60/100

Async/awaitConcurrencyProgramming Languages

Key topics

Async/await

Concurrency

Programming Languages

The article provides an in-depth overview of async/await and its complexities, sparking a discussion on its performance, use cases, and comparisons with other concurrency models.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

Day 1

Avg / period

7.4

Comment distribution37 data points

Loading chart...

Based on 37 loaded comments

Key moments

01Story posted
Nov 1, 2025 at 9:10 PM EDT
2 months ago
Step 01
02First comment
Nov 1, 2025 at 11:09 PM EDT
2h after posting
Step 02
03Peak activity
30 comments in Day 1
Hottest window of the conversation
Step 03
04Latest activity
Nov 10, 2025 at 12:40 PM EST
about 2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (38 comments)

Showing 37 comments of 38

neonsunset

2 months ago

1 reply

Thank you for the article. I noticed the statement

> A second drawback is that async/await has a performance cost. CPU-bound code written with async/await will simply never be as fast or as memory-efficient as the equivalent synchronous code.

If you are interested, .NET is actively improving at this and .NET 11 will ship with "Runtime Async" which replaces explicitly generated state machines with runtime suspension mechanism. It's not """zero-cost""" for now (for example it can block object escape analysis), and the async calling convention is different to sync, but the cost is massively reduced, the calls can be inlined, optimized away, devirtualized and more in the same way standard sync calls can. There will be few drawbacks to using async at that point, save for the syntax noise and poor default habit in .NET to append Async suffix to such methods. In your own code you can write it tersely however.

As for Rust, it also can optimize it quite well, the "call-level overhead" is much less of a problem there, although I have not studied compiler output for async Rust in detail so hopefully someone with more familiarity can weight in.

Yoric

2 months ago

(author) Thanks, I'll need to read up on this!

jiggunjer

2 months ago

2 replies

Recently had to familiarize myself with python async because a third party SDK relies on it.

In many cases the lib will rely on threads to handle calls to synchronous functions, got me wondering if there's a valid use case for running multiple async threads on a single core.

conradludgate

2 months ago

I frequently use single threaded async runtimes in Rust. Particularly if it's background processing that doesn't need to be particularly high throughput.

Eg in a user application you might have the performance sensitive work (eg rendering) which needs to be highly parallel - give it a bunch of threads. However when drawing the UI, handing user input, etc you usually don't need high throughput - use only 1 thread to minimise the impact on the rendering threads

In my work with server side code, I use multiple async runtimes. One runtime is multithreaded and handles all the real traffic. One runtime is singlethreaded and handles management operations such as dispatching metrics and logs or garbage collecting our caches

electroglyph

2 months ago

i would say: probably not

if your async thread is so busy that you need another one, then it's probably not an async workload to begin with.

i work on a python app which uses threads and async, but only have one async thread because it's more than enough to handle all the async work i throw at it.

travisgriggs

2 months ago

1 reply

That was “quite a few” words. I wish the author had taken more time with Elixir/Erlang.

Languages like rust/python that use lots of reserved keywords, especially for control flow seem to have reached for that arrow to solve the “event loop” problem as described.

In BEAM languages, that very event loop stays front and center, you don’t have this awkward entanglement between reserved keywords and event loop. If you want another chunk of thing to happen later because of an event, you just arrange for an event of that nature to be delivered. No callbacks. No async coloring. Just events. The solution to the event problem is to double down and make your event loop more generally usable.

evnc

2 months ago

1 reply

Interesting. Does that mean if you want to say, make an asynchronous http request, you do something like “fire_event(HttpRequestEvent(…))” which returns immediately, and somewhere else define a handler like “on_event(HttpResponseEvent, function (event) { … })” ? So you kind of have to manually break your function up into a state machine composed of event handlers? How do you associate a given HttpResponseEvent with a specific HttpRequestEvent?

koakuma-chan

2 months ago

1 reply

Isn't that just callback

travisgriggs

2 months ago

1 reply

It is a callback of sorts. But most languages do callbacks with some sort of anonymous closure mechanism (or more primitively, pass function pointer/identifiers). What makes BEAM interesting is its prevalence of generalizing callbacks themselves as more events (messages).

koakuma-chan

2 months ago

1 reply

You can also define a callback function in, e.g., JavaScript, and pass its name instead of an anonymous closure. Does "BEAM" do anything that JavaScript can't or doesn't?

Yoric

2 months ago

There are a few very large features that BEAM offers that, as far as I can tell, no other industrial language/VM implements. In particular, BEAM is meant for distributed computation.

You can spawn new processes, communicate between processes (which don't have to be on the same computer), sending any kind of data between them including closures.

BEAM also has an error model designed to handle concurrent and distributed failures, e.g. a process may fail and another process (which, again, may or may not be on the same machine) monitoring it may decide to restart it, or to do some recovery, etc.

BEAM builds into it a number of the features for which we use orchestration, observability, just simpler and (generally) more flexible. And this is a platform that has been used in the industry since the 90s.

vlovich123

2 months ago

4 replies

> In practice, things are a bit more complicated. In fact, I don’t know of any async/await embedding on top of io_uring in any language yet, because it doesn’t quite match this model. But generally, that’s the idea.

Glommio and monoio are async runtimes in rust on top of io_uring and Tokio has an optional io_uring backend. Does that not count? This is such a well researched article that this kind of statement makes me think I’m missing something - surprising the author would get this wrong.

koakuma-chan

2 months ago

1 reply

As far as I know those libraries only implement basic things. They don't use registered buffers, registered file descriptors, etc, and don't implement advanced features like chained operations.

ozgrakkurt

2 months ago

They are async libraries built on io-uring though. Other mainstream async libraries also don’t go as deep as possible on epoll or other things either afaik

Yoric

2 months ago

1 reply

(author here)

I didn't mention tokio's io_uring because, as far as I understand, it is unmaintained. I vaguely recall a conversation in which someone (a contributor?) was claiming that it was not possible to implement most of the features of tokio on io_uring due to conflicting models. [source needed], obviously.

I will admit the very existence of glommio or monoio had entirely slipped my mind. I'll probably need to add a few paragraphs about thread-per-core runtimes. Thanks!

yencabulator

about 2 months ago

> due to conflicting models

The big one is this, and this will wreck a lot of pre-io_uring APIs:

Historically, you pass in a buffer to read(2). There is one buffer per pending read. (And this is a scalability limitation.)

With io_uring, you have a pool of buffers and a read completes by grabbing a buffer from the pool and putting the data there.

io_uring highest-performance API is fundamentally at odds with the historical read API that was inherited from POSIX to just about every stdlib.

lukeh

2 months ago

I wrote one for Swift a few years ago, not sure if anyone else is using it but I am!

https://github.com/PADL/IORingSwift

gpderetta

about 2 months ago

boost.asio as well seems [1] to have io_uring support, including registered buffers. It was experimental in 1.21; lots of fixes since, don't know if it is currently considered stable.

Asio supports async/await, stackful coroutines and plain old manual continuation passing.

[1] https://think-async.com/Asio/asio-1.36.0/doc/asio/history.ht...

unscaled

2 months ago

2 replies

This is a pretty in depth overview of a complex topic, which unfortunately most people tends to dumb down considerably. Commonly cited articles such as "What Color is Your Function?" or Revisiting Coroutines by the de Moura and Ierusalimschy are insightful, but they tend to pick on a a subset of the properties that make up this complex topic of concurrency. Misguided commentators on HN often recommends these articles as reviews, but they are not reviews and you are guaranteed to learn all the wrong lessons if you approach them this way.

This article looks like a real review. I only have one concern with it: It oversells M:N concurrency with green threads over async/await. If I understand correctly, it claims that async/await (as implemented by Rust, Python C# and Kotlin - not JavaScript) is less efficient (both in terms of RAM and CPU) than M:N concurrency using green threads. The main advantages it has is that No GC is required, C library calls carry no extra cost and the cost of using async functions is always explicit. This makes async/await great for a systems language like Rust, but it also pushes a hidden claim that Python, C# and Kotlin all made a mistake by choosing async/await. It's a more nuanced approach than what people take by incorrectly reading the articles I mentioned above, but I think it's still misguided. I might also be reading this incorrectly, but then I think the article is just not being clear enough about the issues of cost.

To put it shortly: Both green threads and async/await are significantly costlier than single-threaded code, but their cost manifests in different ways. With async/await the cost mostly manifests at "suspension points" (whenever you're writing "await"), which are very explicit. With green threads, the cost is spread everywhere. The CPU cost of green threads includes not only the wrapping C library calls (which is mentioned), but also the cost of resizing or segmenting the stack (since we cannot juts preallocate a 1MiB stack for each coroutine). Go started out with segmented stacks and moved on to allocating a new small stack (2KiB IIRC) for each new goroutine and copying it to a new stack every time it needs to grow[1]. That mechanism alone carries its own overhead.

The other issue that is mentioned with regards to async/await but is portrayed as "resolved" for green threads is memory efficiency, but this couldn't be farther from the truth: when it's implemented as a state machine, async/await is always more efficient than green threads. Async/await allocates memory on every suspension, but it only saves the state that needs to be saved for this suspension (as an oversimplification we can say it only saves the variables already allocated on the stack). Green threads, on the other hand, always allocate extra space on the stack, so there would always be some overhead. Don't get me wrong here: green threads with dynamic stacks are considerably cheaper than real threads and you can comfortably run hundreds of thousands of them on a single machine. But async/await state machines are even cheaper.

I also have a few other nitpicks (maybe these issues come from the languages this article focuses on, mainly Go, Python, Rust and JavaScript)

- If I understand correctly, the article claims async/await doesn't suffer from "multi-threading risks". This is mostly true in Rust, Python with GIL and JavaScript, for different reasons that have more to do with each language than async/await: JavaScript is single-threaded, Python (by default) has a GIL, and Rust doesn't let you have write non-thread-safe code even if you're using plain old threads. But that's not the case with C# or Kotlin: you still need to be careful with async/await in these languages just as you would be when writing goroutines in Go. On the other hand, if you write Lua coroutines (which are equivalent to Goroutines in Go), you can safely ignore synchronization unless you have a shared memory value that needs to be updated across suspension points.

- Most green thread implementations would block the host thread completely if you call a blocking function from a non-blocking coroutine. Go is an outlier even among the languages that employ green threads, since it supports full preemption of long-running goroutines (even if no C library code is called). But even Go only added full support for preemption with Go 1.14. I'm not quite since when long-running Cgo function calls have been preemptible, but this still shows that Go is doing its own thing here. If you have to use green threads on another language like Lua or Erlang, you shouldn't expect this behavior.

[1] https://blog.cloudflare.com/how-stacks-are-handled-in-go/

Yoric

2 months ago

(author here)

1. Thanks for your remarks on memory efficiency. I wrote that piece a few months ago, I'll have to reread it, but if I implied something wrong, I'll try and amend it!

2. Regarding "multi-threading risks", I don't think I claim that. I have definitely encountered race conditions in single-threaded async code. You don't encounter the same kind of memory corruptions as in, say, multi-threaded C, but you can definitely break invariants on data structures. If I miswrote/wrote something unclear, I'll need to fix that, too!

zozbot234

2 months ago

> But that's not the case with C# or Kotlin: you still need to be careful with async/await in these languages just as you would be when writing goroutines in Go.

C# and Kotlin are safe from data races; Go is not. If you do not explicitly synchronize in C#/Kotlin you may see torn writes and other anomalies, but these will not directly impact safety unlike in Go.

valcron1000

2 months ago

3 replies

> async/await is also available in a bunch of other languages, including F#, C#8, Haskell[...]

Haskell (GHC) does not provide async/await but uses a green thread model.

LtWorf

2 months ago

1 reply

How are green threads implemented?

whatevaa

2 months ago

A runtime with it's own scheduling. Something rust doesn't want to require.

kaoD

2 months ago

1 reply

Aren't green threads and async-await orthogonal concepts?

As I understand it async-await is syntax sugar to write a state machine for cooperative multitasking. Green "threads" are threads implemented in user code that might or might not use OS threads. E.g.:

- You can use Rust tokio::task (green threads) with a manually coded Future with no async-await sugar, which might or might not be parallelized depending on the Tokio runtime it's running on.

- ...or with a Future returned by an async block, which allows async-await syntax.

- You can have a Future created by an async function call and poll it manually from an OS thread.

- Node has async-await syntax to express concurrency but it has no parallelism at all since it is single-threaded. I think no green threads either (neither parallel or not) since Promises are stackless?

Is this a new usage of the term I don't know about? What does it mean? Or did I misinterpret the "but"?

As a non-Haskeller I guess it doesn't need explicit async-await syntax because there might be some way to express the same concept with monads?

valcron1000

2 months ago

1 reply

You don't need "monads" (in plural) since GHC provides a runtime where threads are not 1:1 OS threads but rather are managed at the user level, similar to what you have in Go. You can implement async/await as a library though [1]

[1] https://www.cambridge.org/core/services/aop-cambridge-core/c...

kaoD

about 2 months ago

1 reply

What do you mean with "in plural"?

valcron1000

about 2 months ago

In the sense that you only need to work with the standard IO monad to get the benefits of the runtime.

Yoric

2 months ago

(author here)

Well, I haven't used Haskell in a few years, so I could absolutely be wrong. That being said, I'm almost sure that I saw a presentation by Simon Marlowe 15-20 years ago demonstrating GHC with a multicore scheduler (alongside `seq` and `par`). Also, from the very same Simon Marlowe, there's a package called `async` https://hackage.haskell.org/package/async which basically provides async (no await, though).

anonymoushn

2 months ago

1 reply

it is frustrating that the post opens by describing latency and then saying that it is called throughput.

littlestymaar

2 months ago

1 reply

In a single-task setting (the situation described in the intro) throughput and latency are just the inverse of one another (in the mathematical sense of “inverse”: throughput = nb task per seconds = 1/time taken to process the task = 1/latency).

They only diverge when you consider multiple tasks.

derriz

2 months ago

1 reply

That’s not the way “latency” is commonly used in my experience.

Latency numbers always include queuing time - so the measures are not related or derivable from each other.

A process might have a throughput of 1 million jobs per second but if the average size of the queue is 10 million then your job latency is going to be 10 seconds on average and not 1 microsecond.

littlestymaar

about 2 months ago

There's no queue if you only have one task…

vismit2000

2 months ago

Related (3 months ago): A conceptual overview of asyncio - https://news.ycombinator.com/item?id=44638710

mont_tag

2 months ago

Do you want to take advantage of having multiple cores?

* Processes do this right out of the box. * Threads only do this on Python's new GIL builds. * Async, not so much.

sema4hacker

2 months ago

> I’ll try and write a followup with benchmarks.

That would definitely keep the story from being all-hat-and-no-cattle. I can't recall reading something with so many alternate versions of how to implement something but with zero benchmarks.

ballpug

2 months ago

raw synchronisation costs, 2-5µs during each context-switch to replace registers, pointers, interrupt handlers, etc.

for python syntax to enumerate the fibonacci sequence:

#fibonacci(n - 1) + fibonacci(n - 2)

Which computes event.arg

1 more comments available on Hacker News

View full discussion on Hacker News

ID: 45787036Type: storyLast synced: 11/20/2025, 5:11:42 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN