Lockless Mpsc/spmc/mpmc Queues Are Not Queues
Posted3 months agoActive3 months ago
alexsaveau.devTechstory
controversialmixed
Debate
80/100
Lockless QueuesConcurrent ProgrammingData Structures
Key topics
Lockless Queues
Concurrent Programming
Data Structures
The article argues that lockless MPSC/SPMC/MPMC queues are not true queues due to the lack of global ordering, sparking a debate on the definition and usefulness of such data structures.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
10h
Peak period
16
12-18h
Avg / period
6.2
Comment distribution31 data points
Loading chart...
Based on 31 loaded comments
Key moments
- 01Story posted
Sep 28, 2025 at 8:21 PM EDT
3 months ago
Step 01 - 02First comment
Sep 29, 2025 at 6:15 AM EDT
10h after posting
Step 02 - 03Peak activity
16 comments in 12-18h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 1, 2025 at 4:13 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45409258Type: storyLast synced: 11/20/2025, 12:44:40 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
If you go to the the city hall and they have multiple counters, you pull a ticket with your number. There is a global order of dequeue operations. The first person to pull a ticket will be the first to be assigned to a counter. If there is one counter, the second person has to wait for the first person. If there are two counters, then the second person can go to the second counter. If the second counter works faster than the first counter, then the third person will go to the second counter.
According to the blog post this violates a global processing order constraint (created by who?). The people at the second counter appear to be coming from the future from the perspective of a global processing order constraint.
This is a very strange way to look at queues, because the primary purpose of a queue is to temporarily buffer data when there is insufficient capacity to process incoming requests. Calling something broken because it fulfills its purpose is a very negative way to approach life.
I think if that’s all you care about you’re in the clear.
The article is useful in that it considers a more general set of use cases (people do all kinds of weird things) and highlights that a somewhat intuitive assumption one might have may not hold.
In the example with multiple counters, in real life each counter could shout out a number and have people approach their respective counters in parallel. But this is not how lockless queues work. Instead, the person at the head of the queue holds a baton and when multiple numbers are called, everybody waiting in the queue goes up to the counter of the person holding the baton. Once that head-of-the-queue has made it to the counter, they give the baton to the person behind them who then drags everybody along to their counter. And so on.
The article was arguing for a lockless channel implementation akin to your interpretation of a queue with parallel access to the counters.
Also if this is really about bags, why not open with that?
Good news: it's not slow
Let's take rust, which has an oddly thriving ecosystem of lockless mpsc/mpmc/etc queues, and lots of benchmarks of them in lots of configurations.
The fastest ones easily do at least 30 million elements/second in most configurations. The "slowest" around 5-10.
So the fastest is doing 33ns per element and the slowest is 100ns.
Which is probably why the article offers no data. It's not actually slow. It's actually really fast when done well.
Absolute ms doesn't prove much unless you put it in comparison with other best.
Theoretically better (which is questionable at best in this case) doesn't mean anything useful if it doesn't actually matter in practice, or is always worse in pracftice.
Reading the article again, this lockless bag idea also needs two atomic writes on every operation.
In my benchmarks[1], the average processing time for an element is 250ns with 4 producers and 4 consumers contending heavily. That's terrible! Even if your numbers are correct, 100ns is a bit faster than two round trips to RAM while 33ns is about three round trips to L3 and ~100x slower than spamming a core with add operations. That's slow.
[1]: https://github.com/SUPERCILEX/lockness/blob/master/bags/benc... $ cargo bench -- 8_threads/std_mpmc
Also it doesn't look like you are using any of the optimized lockless implementations like crossfire/etc, so not sure what this actually proves?
The article basically says that when you have multiple suppliers or consumers, the "order" of the queue loses meaning. It turns into an "unordered" pool of data. Therefore focus should be shifted from maintain a "queue of data" to a "bag of data".
Even slightly-misordered queue has very useful fairness guarantees - for a queue of size N, an item will spend O(N) steps in the queue.
"Bags of data" have no guarantees of that sort at all. By their definition, an item can spent an arbitrary time in the "bag", which makes those "bags" pretty useless for many real-time use cases.
As an example, imagine the bag where readers and writers always prefer smaller indices. So if an item gets stuck at the last slot, it will never gets picked up as long as new items are coming. This is a pretty useless behavior which is yet totally compatible with "bag of data" definition.
The disconnect to me is that the title says "it's not a queue". It is a queue, its just (as you note) not being used in a way where it being a queue provides any benefit.
The article makes it sound a bit as if they were all created equal, while I think only MPSC is really used widely. The others are all kind if niche, because usually we have better solutions for the problems they are suitable for.
However, that can be implemented with a MPMC queue with no exotic operations with precise O(1) behavior with no size bounds. For a bag with two N-bit bit-vectors, just have two N-bit queues of indexs. Dequeue on acquire (giving you a unused index) and enqueue on release (releasing your now used index). If anybody finishes processing quickly, they release back to the queue for new processing which means there is no head-of-line blocking. Only by having all N actively processing do you wait.
I guess you also still have the tiny window on enqueue between reserving the slot and actually writing out the index where you could tear due to a perfectly sub-optimal context switch.
Such a operation would improve multi-producer queues with fixed-size elements smaller than a word and would likely be fairly trivial to implement, merely requiring fusion of two existing instructions in a way that prevents interruption between them.
This is simply not true. A standard multi-consumer queue is ordered on the consumer side so long as consumers synchronize with each other. This could be as simple as one consumer setting a global flag after receiving message A and another consumer observing that flag before receiving message B. A lockless bag will not have this property.
Similarly, any of these queues have the nice property that, even if no one synchronizes anything, they’re fair: even under heavy load such that the queue never comes close to emptying, every push attempt will be fairly serviced by a pop without any other synchronization needed.
Attempting to relax either of these could be problematic, especially as a drop-in replacement of the data structure. I suspect that quite a lot of software that uses queues pushes special tokens to indicate changes of global state, and safety and correctness may be violated by reordering. For example, an MPMC system could have a producer push a special “I’m done” token when it’s all done, and the consumers could collectively count the “I’m done” tokens and assume (correctly on a standard MPMC queue) that no more messages will be in the queue once one “I’m done” per producer have been globally seen. If those tokens come out a bag too early, then a different synchronization mechanism would be needed.
The front of the line is near service desk 1. When it’s your turn, you have to walk to service desk 10. As you’re walking to it, service desk 1 frees up, and the person who was behind you in line walks to service desk 1. They arrive in seconds, while you’re still halfway to being served, a few meters away.
Op is pointing out that (programming) queues without locking lack a certain purity which is also lacking in real life queues.
> Lockless queues are slow
That particular implementation of a lockless queue may be slow, but that is not globally true for lockless queues. There are variants, for example BBQ, https://www.usenix.org/conference/atc22/presentation/wang-ji..., that have great performance.
Also there are implications for fairness. A queue implemented as a stack might means that messages pushed during a spike might never be processed or experience very high latency.
Finally it is actually possible to implement a decent lock free MPSC queue in top of a node-based stack: the consumer would pop all elements on one operation, then reverse the list. This N cost is amortized over N pops so it is actually not bad.