I/o Multiplexing (select Vs. Poll Vs. Epoll/kqueue)
Posted3 months agoActive3 months ago
nima101.github.ioTechstoryHigh profile
calmmixed
Debate
60/100
I/o MultiplexingSystem ProgrammingEvent LoopsNetworking
Key topics
I/o Multiplexing
System Programming
Event Loops
Networking
The article compares different I/O multiplexing APIs (select, poll, epoll/kqueue) and sparks a discussion on their trade-offs, limitations, and potential improvements.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
3d
Peak period
34
72-84h
Avg / period
15.5
Comment distribution62 data points
Loading chart...
Based on 62 loaded comments
Key moments
- 01Story posted
Oct 9, 2025 at 12:06 AM EDT
3 months ago
Step 01 - 02First comment
Oct 12, 2025 at 2:38 AM EDT
3d after posting
Step 02 - 03Peak activity
34 comments in 72-84h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 13, 2025 at 10:06 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45523370Type: storyLast synced: 11/20/2025, 5:45:28 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Besides kqueue grew from FreeBSD, not OSX. Such ignorance saddens me much more.
[1]: https://libevent.org
Unless I am mistaken, OpenBSD base even explicitly codes against the older libevent API internally and ships it with each release, despite at the very least supporting kqueue, and thus gains better portability for a number of their tools this way.
Personally, I just go with Posix select for small programs where performance is not critical anyway.
…which is why there is libverto, a 2nd order abstraction.
It'd be funny if it weren't also sad.
Neither poll nor select are deprecated. They're just not good fits for particular use patterns. But even select() is fine if you just need to watch 2 FDs in a CLI tool.
In fact, due to its footguns, I'd highly advise against epoll (particularly edge triggering) unless you really need it.
https://man.freebsd.org/cgi/man.cgi?select
The man page also suggests how you might increase the FD limit if needed. I still use select for a small number of FDs where overhead isn't a real concern, and select is a good fit.
They're basically identical apis but poll doesn't have a hard limit and works with high number fds.
So the obvious loop using level triggering switched to edge will eventually lock up.
You'll read 4092bbytesbwhen there is 4093 bytes leaving 1 behind and then never get a signal again.
Right now I am working on JavaScript bindings for a project and doing it the node.js way (or Deno) is definitely a no-no. That would be one more layer in the architecture, if not two. Once you have more layers, you also have more layer interactions and that never stops.
I mean, complexity begets complexity
https://github.com/gritzko/librdx/blob/master/js/README.md
Having more than 1000 conns per a thread is a very specific usecase.
Imagine what sort of traffic you need to saturate 1000 conns with HTTP. Can your single thread app handle it? If you are not using nginx, that must be something less trivial than a reverse proxy. Nginx can do lots of things, by the way.
Only if those fds are below ~1024 or whatever. (If you're going to use one of the legacy interfaces, at least poll() doesn't have arbitrary limits on the numeric value of the fd.)
Just don't try to solve the 10,000x problem with it, by putting it on the Internet.
Or, if you do, build it out properly.
Or use epoll or kqueue.
Have they reached feature parity?
https://github.com/mitchellh/libxev/blob/main/src/backend/kq...
> Now, if you try to watch file descriptor 2000, select will loop over fds from 0 to 1999 and will read garbage. The bigger issue is when it tries to set results for a file descriptor past 1024 and tries to set that bit field in say readfds, writefds or errorfds field. At this point it will write something random on the stack eventually crashing the process and making it very hard to debug what happened since your stack is randomized.
I'm not too literate on the Linux kernel code, but I checked, and it looks like the author is right [1].
It would have been so easy to introduce a size check on the array to make sure this can't happen. The man page reads like FD_SETSIZE differs between platforms. It states that FD_SETSIZE is 1024 in glibc, but no upper limit is imposed by the Linux kernel. My guess is that the Linux kernel doesn't want to assume a value of FD_SETSIZE so they leave it unbounded.
It's hard to imagine how anyone came up with this thinking it's a good design. Maybe 1024 FDs was so much at the time when this was designed that nobody considered what would happen if this limit is reached? Or they were working on system where 1024 was the maximum number of FDs that a process can open?
[1]: The core_sys_select function checks the nfds argument passed to select(2) and modifies the fd_set structures that were passed to the system call. The function ensures that n <= max_fds (as the author of the post stated), but it doesn't compare n to the size of the fd_set structures. The set_fd_set function, which modifies the user-side fd_set structures, calls right into __copy_to_user without additional bounds checks. This means page faults will be caught and return -EFAULT, but out-of-bounds accesses that corrupt the user stack are possible.
It's no different that creating a 1024 byte buffer and telling read() to read 2048 bytes into it.
To be fair there's an API bug here in that "fd_set" is a fixed-size thing for historical compatibility reasons, while the kernel accepts arbitrarily large buffers now. So code cutting and pasting from historical examples will have a essentially needless 1024 FD limit.
Stated differently: the POSIX select() has a fixed limit of file descriptors, the linux implementation is extensible. But no one uses the latter feature (because at that scale poll and epoll are much better fits) and there's no formal API for it in the glibc headers.
> there's no formal API for it in the glibc headers
The author claims you can pass nfds > 1024 to select(2).If you use the fd_set structure with a size of 1024, this may lead to memory corruption if an FD > 1023 becomes ready if I understand correctly.
The "problem", such as it is here, is that the POSIX behavior for select() (that it supports only a fixed size for fd_set) was extended in the Linux kernel[1] to allow for arbitrary file descriptor counts. But the POSIX API for select() was not equivalently extended, if you want to use this feature you need to call it with the Linux system call API and not the stuff you find in example code or glibc headers.
[1] To be perfectly honest I don't know if this is unique to Linux. It's a pretty obvious feature, and I bet various BSDs or OS X or whatnot have probably done it too. But no one cares because at the 1024+ FD level System V poll() is a better API, and event-based polling is better still. It's just Unix history at this point and no one's going to fix it for you.
The difference is that fd_set is a structure that's not defined by the user. If fd_set had a standard size, the kernel could verify that nfds is within the allowed range for the fd_set structure. The select(2) system call would be harder to misuse then, although misuse would still be possible by passing custom buffers instead of pointers to fd_set structures. In that sense, I think we agree on the "problem".
It's indeed just a bit of Unix history, but I was surprised by it nonetheless.
The article says select is from 1983. 1024 FDs is a lot for 1983. At least in current FreeBSD, it's easy to #define the setsize to be larger if you're writting an application that needs it larger. It's not so easy to manage if you're a library that might need to select larger FDs.
Lots of socket syscalls include a size parameter, which would help with this kind of thing. But you still might buffer overflow with FD_SET in userspace.
The goal was to understand the underlying mechanisms behind python's async/await and to help coworkers understand how event loops work under the hoods.
The end result is somewhat interesting, as unlike traditional event loop libraries, it doesn't use callbacks as the scheduling primitive: https://gist.github.com/tarruda/5b8c19779c8ff4e8100f0b37eb59...
If you must implement your own event loop and you want your application to be portable, poll is still a good place to begin.
O(N) demultiplexing time in the pollfd array is also not as brutal as it seems on modern hardware. The pollfd structure itself is only 8 bytes wide, so you can comfortably pack thousands of them into the L1 cache. Copying all of the elements that have an active event into a new smaller array before processing them is going to be fast enough for most cases.
Wish they'd give some credit to FreeBSD, where it originated..