Async DNS

Posted28 days agoActive25 days ago

todsacerdoti

131 points

46 comments

flak.tedunangst.comTech Discussionstory

informativepositive

Debate

20/100

Coding ParadigmsDNSNetworking

Key topics

Coding Paradigms

DNS

Networking

The async DNS discussion sparked a lively debate about the merits of `pthread_cancel()`, a function that allows threads to be cancelled asynchronously. Commenters weighed in on its original purpose, with some arguing it was meant for interrupting long computations, not I/O operations, while others pointed out its potential pitfalls, such as unexpectedly returning errors from previously reliable APIs. A consensus emerged that `pthread_cancel()` is problematic, with some suggesting alternative approaches, like returning ECANCELED or using `pthread_kill()` with EINTR, to achieve similar functionality without the drawbacks. The discussion remains relevant as developers continue to grapple with asynchronous programming and thread management.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

58m

Peak period

0-6h

Avg / period

6.1

Comment distribution49 data points

Loading chart...

Based on 49 loaded comments

Key moments

01Story posted
Dec 12, 2025 at 11:52 AM EST
28 days ago
Step 01
02First comment
Dec 12, 2025 at 12:50 PM EST
58m after posting
Step 02
03Peak activity
26 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Dec 15, 2025 at 11:57 PM EST
25 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (46 comments)

Showing 49 comments

albertzeyer

28 days ago

1 reply

The first linked article was recently discussed here: RIP pthread_cancel (https://news.ycombinator.com/item?id=45233713)

In that discussion, most of the same points as in this article were already discussed, specifically some async DNS alternatives.

frumplestlatz

28 days ago

5 replies

I am always amused when folks rediscover the bad idea that is `pthread_cancel()` — it’s amazing that it was ever part of the standard.

We knew it was a bad idea at the time it was standardized in the 1990s, but politics and the inevitable allure of a very convenient sounding bad idea meant that it won out.

Funny enough, while Java has deprecated their version of thread cancellation for the same reasons, Haskell still has theirs. When you’re writing code in IO, you have to be prepared for async cancellation anywhere, at any time.

This leads to common bugs in the standard library that you really wouldn’t expect from a language like Haskell; e.g. https://github.com/haskell/process/issues/183 (withCreateProcess async exception safety)

AndyKelley

28 days ago

3 replies

What's crazy is that it's almost good. All they had to do was make the next syscall return ECANCELED (already a defined error code!) rather than terminating the thread.

Musl has an undocumented extension that does exactly this: PTHREAD_CANCEL_MASKED passed to pthread_setcancelstate.

It's great and it should be standardized.

gpderetta

28 days ago

2 replies

You can sort of emulate that with pthread_kill and EINTR but you need to control all code that can call interrupt able sys calls (or longjmp/throw from the signal handler, but then we are back in phtread_cancel territory.

AndyKelley

28 days ago

1 reply

There's a second problem here that musl also solves. If the signal is delivered in between checking for cancelation and the syscall machine code instruction, the interrupt is missed. This can cause a deadlock if the syscall was going to wait indefinitely and the application relies on cancelation for interruption.

Musl solves this problem by inspecting the program counter in the interrupt handler and checking if it falls specifically in that range, and if so, modifying registers such that when it returns from the signal, it returns to instructions that cause ECANCELED to be returned.

Veserv

28 days ago

Introspection windows from a interrupting context are a neat technique. You can use it to implement “atomic transaction” guarantees for the interruptee as long as you control all potential interrupters. You can also implement “non-interruption” sections and bailout logic.

cryptonector

28 days ago

In particular you need to control the signal handlers. You can't do that easily in a library.

frumplestlatz

28 days ago

That would have been fantastic. My worry is if we standardized it now, a lot of library code would be unexpectedly dealing with ECANCELED from APIs that previously were guaranteed to never fail outside of programmer error, e.g. `pthread_mutex_lock()`.

Looking at some of my shipping code, there's a fair bit that triggers a runtime `assert()` if `pthread_mutex_lock()` fails, as that should never occur outside of a locking bug of my own making.

cryptonector

28 days ago

`pthread_cancel()` was meant for interrupting long computations, not I/O.

paulddraper

28 days ago

1 reply

IO can fail at any point though, so that’s not particularly bad.

marcosdumay

28 days ago

It's particularly bad because thread interruptions are funneled into the same system as IO errors, so it's easy to consume them by mistake.

Java has that same issue.

cryptonector

28 days ago

1 reply

`pthread_cancel()` is necessary _only_ to interrupt compute-only code without killing the entire process. That's it. The moment you try to use it to interrupt _I/O_ you lose -- you lose BIG.

nextaccountic

26 days ago

1 reply

there is a better way - in any unbounded compute loop, add some code to check for cancellation. it can be very very very cheap

this is not possible if you are calling third party code that you can't modify. in this case it's probably a better idea to run it on another process and use shared memory to communicate back results. this can even be done in an airtight sandboxed manner (browsers do this for example), something that can't really be done with threads

cryptonector

25 days ago

Right, and then you can kill it, but that's essentially what `pthread_cancel()` is. `pthread_cancel()` is just fine as long as that's all you use it for. The moment you go beyond interruption of 100% compute-bound work, you're in for a world of hurt.

themafia

28 days ago

It always surprised me that in the path of so many glibc functions are calls to open() items in /etc and then parse their output into some kind of value to use or possibly return.

The initialization of these objects should have been separate and then used as a parameter to the functions that operate on them. Then you could load the /etc/gai.conf configuration, parse it, then pass that to getaddrinfo(). The fact that multiple cancellation points are discreetly buried in the paths of these functions is an element of unfortunate design.

kccqzy

28 days ago

It’s extremely easy to write application code in Haskell that handles async cancellation correctly without even thinking about it. The async library provides high level abstractions. However your point is still valid as I do think if you write library code at a low level of abstraction (the standard library must) it is just as error prone as in Java or C.

01HNNWZ0MV43FF

28 days ago

3 replies

It's weird to me that event-based DNS using epoll or similar doesn't have a battle-tested implementation. I know it's harder to do in C than in Rust but I'm pretty sure that's what Hickory does internally.

frumplestlatz

28 days ago

1 reply

it’s a weird problem, in that (1) DNS is hard, and (2) you really need the upstream vendor to solve the problem, because correct applications want to use the system resolver.

If you don’t use the system resolver, you have to glue into the system’s configuration mechanism for resolvers somehow … which isn’t simple — for example, there’s a lot of complex logic on macOS around handling which resolver to use based on what connections, VPNs, etc, are present.

And the there’s nsswitch and other plugin systems that are meant to allow globally configured hooks plug into the name resolution path.

AndyKelley

28 days ago

3 replies

(1) DNS is hard

It's really not.

Just because some systems took something fundamentally simple and wrapped a bunch of unnecessary complexity around it does not make it hard.

At its core, it's an elegant, minimal protocol.

bwblabs

28 days ago

1 reply

It falls into the category that most people think they understand DNS, the same as JavaScript, or e.g. elections, but the devil is in the detail. And I can tell you, at least for DNS (and Dutch Elections), it's kind of tricky, see fun cases like https://github.com/internetstandards/Internet.nl/issues/1370 and I thought the same before I had my current job which involves quite some tricky DNS stuff (and regarding this we also sometimes encounter bugs in unbound https://github.com/internetstandards/Internet.nl/issues/1803 )

cryptonector

27 days ago

1 reply

Dutch elections? How do they come into this?

bwblabs

25 days ago

There is this list of things tech people think they understand (DNS, javascript), and more common you can see this with everyday people, e.g. with stuff like elections: the basic concept is clear, understandable, but the devil/complexity is in the detail, how to handle certain exceptions. I was employed by the Election Management Body of The Netherlands for a few years, so I can only vouch for the complexity of that relatively simple election system, but I'm pretty sure it will hold for about every country ;)

tptacek

28 days ago

Lots of elegant, minimal things are hard to use effectively.

kccqzy

28 days ago

You and GP are talking about completely different things. Yes DNS at its core it’s an elegant minimal protocol. But all the complexity comes from client side configuration before the protocol is even involved.

We have complexity like different kinds of VPNs, from network-level VPNs to app-based VPNs to MDM-managed VPNs possibly coexisting. We have on-demand VPNs that only start when a particular domain is being visited: VPN starting because of DNS. We have user-provided or admin-provided hardcoded responses in /etc/hosts. We have user-specified resolver overrides (for example the user wants to use 8.8.8.8 not ISP resolver). We have multiple sources of network-provided resolvers from RDNSS to DHCPv6 O mode.

It is non-trivial to determine which resolver to even start sending datagrams with that elegant minimal protocol.

leshow

27 days ago

1 reply

I use hickory a lot and have contributed to it. It does have a pretty robust async DNS implementation, and its helpfully split into multiple different crates so you can pick your entry point into the stack. For instance, it offers a recursive resolver, but you can also just import the protocol library and build your own with tokio.

cryptonector

27 days ago

1 reply

Link?

marcusb

26 days ago

1 reply

I'm one of the Hickory maintainers, although I mainly work on the server-side code.

https://github.com/hickory-dns/hickory-dns is our Git repo

Documentation for the resolver including an example: https://docs.rs/hickory-resolver/latest/hickory_resolver/ind...

cryptonector

26 days ago

Thank you!

citrin_ru

26 days ago

Many async frameworks (e. g. libevent [1]) have a DNS client. But it's not something easy to use unless your program uses this specific framework (say libevent) for all network I/O. The problem is not that it's hard to do in C but that there is no single async framework everyone would use.

[1] https://libevent.org/libevent-book/Ref9_dns.html

brcmthrowaway

28 days ago

1 reply

Who can fix getaddrinfo?

AndyKelley

28 days ago

1 reply

libcs can add extensions and then applications can detect when they are targeting those systems and use them.

POSIX can specify a new version of DNS resolution.

Applications on Linux can bypass libc.

brcmthrowaway

28 days ago

1 reply

What about macOS?

AndyKelley

28 days ago

they already have CFHostStartInfoResolution / CFHostCancelInfoResolution

btown

28 days ago

1 reply

For those using it in Python, Gevent provides a pluggable set of DNS resolvers that monkey-patch the standard library's functions for async/cooperative use, including one built on c-ares: https://www.gevent.org/dns.html

petcat

28 days ago

1 reply

gevent. Man that's a blast from the past

btown

28 days ago

Still alive and kicking in production for us!

dweekly

28 days ago

2 replies

I was able in an afternoon to implement a pretty decent completely async Swift DNS resolver client for my app. DNS clients are simple enough to build that rolling your own async is not a big deal anymore.

Yes, there is separate work to discern what DNS server the system is currently using: on macOS this requires a call to an undocumented function in libSystem - that both Chromium and Tailscale use!

AaronFriel

28 days ago

1 reply

A lot of folks think this, but did you also implement EDNS0?

The golang team also thought DNS clients were simple, and it led to almost ten years of difficult to debug panics in Docker, Mesos, Terraform, Mesos, Consul, Heroku, Weave and countless other services and CLI tools written in Go. (Search "cannot unmarshal DNS message" and marvel at the thousands of forum threads and GitHub issues that all bottom out at Go implementing the original DNS spec and not following later updates.)

formerly_proven

28 days ago

nsswitch cough

frumplestlatz

28 days ago

3 replies

Even once you use the very private `dns_config*()` APIs on macOS, you need to put in heavy lifting to correctly handle scoped, service-specific providers, supplemental matching rules, etc -- none of which is documented, and can change in the future.

Since you're not using the system resolver, you won't benefit from mDNSResponder's built-in DNS caching and mDNS resolution/caching/service registration, so you're going to need to reimplement all of of that, too. And don't forget about nsswitch on BSD/Linux/Solaris/etc; there's no generic API that would let you plug into this cleanly, so for a complete implementation there, you need to:

- Reimplement built-in modules like `hosts` (for `/etc/hosts`), `cache` (query a local `nscd` cache, etc). Fortunately nobody is using the `nis` module anymore, even if it's still technically supported.

- Parse the nsswitch.conf configuration file, including the rule syntax for defining whether to continue/return on different status codes.

- Reimplement rule-based dispatch to both the built-in modules and custom, dynamically loaded modules (like `nss_mdns` for mDNS resolution).

This all differs slightly across operating systems in terms of API, config file format, etc, too, and of course Windows has its own completely distinct mechanisms that you'd need to handle too.

Re-implementing all of this correctly, thoroughly, and* keeping it working across OS changes is extremely non-trivial.

dweekly

28 days ago

1 reply

Good points, all - there is a lot of subtlety here.

CFHostStartInfoResolution is deprecated, no? https://developer.apple.com/documentation/cfnetwork/cfhostst...:)

That leaves us with DNSServiceGetAddrInfo? https://developer.apple.com/documentation/dnssd/dnsservicege...:) or some kinda convoluted use of Network and NWEndpoint/NWconnection with continuations could do the same?

frumplestlatz

28 days ago

Oh yes, good catch. Yeah, you want to use `NWConnection` (or one of the other higher-level supported networking APIs), which raises another issue with doing custom DNS resolution. You need those API's connect-by-name semantics to get VPN-on-Demand:

https://developer.apple.com/documentation/technotes/tn3151-c...

cryptonector

27 days ago

Browsers don't care about the nsswitch though. There are apps where that complexity can be avoided.

GoblinSlayer

27 days ago

Doesn't linux run resolved locally? You just send request there and it handles hosts, cache and whatnot.

jupp0r

28 days ago

libuv? libevent?

cryptonector

28 days ago

I'm digging dns.c and asr. I might get dns.c building and use it.

benatkin

28 days ago

Another related article: https://ziglang.org/devlog/2025/#2025-10-15

csb6

28 days ago

This article [0] looks at similar problems. It seems like something as fundamental to the Internet as DNS APIs should be a solved problem by now, but maybe being so fundamental makes it hard to change things.

[0] https://valentin.gosu.se/blog/2025/02/getaddrinfo-sucks-ever...

javantanna

28 days ago

Just curious how you approached performance bottlenecks — anything surprising you discovered while testing?

View full discussion on Hacker News

ID: 46245923Type: storyLast synced: 12/15/2025, 4:50:33 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN