Preserving Order in Concurrent Go Apps: Three Approaches Compared
Posted4 months agoActive4 months ago
destel.devTechstory
calmpositive
Debate
40/100
ConcurrencyGo Programming LanguageSoftware Development
Key topics
Concurrency
Go Programming Language
Software Development
The article compares three approaches to preserving order in concurrent Go applications, sparking a discussion on concurrency patterns and alternative solutions.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
8h
Peak period
21
Day 1
Avg / period
8.3
Comment distribution25 data points
Loading chart...
Based on 25 loaded comments
Key moments
- 01Story posted
Sep 1, 2025 at 2:14 AM EDT
4 months ago
Step 01 - 02First comment
Sep 1, 2025 at 9:57 AM EDT
8h after posting
Step 02 - 03Peak activity
21 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 10, 2025 at 7:14 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45089938Type: storyLast synced: 11/20/2025, 2:30:18 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
What do you think about the order preserving simplicity of Java?
If you want more control or have more complex use cases, you can use an ExecutorService of your choice, handle the futures yourself or get creative with Javas new structured concurrency.Is this basically what Java is doing?
I think that maybe the techniques in this article are a little more complex, allowing you to optimize further (basically continue working as soon as possible instead of just waiting for everything to complete and reordering after the fact) but I’d be curious to know if I’ve missed something.
Your snippet looks good and concise.
One thing I haven’t emphasized enough in the article is that all algorithms there are designed to work with potentially infinite streams
If everything fits in memory, that's completely fine. And then yeah, this is wildly overcomplicated, just use a waitgroup and a slice and write each result into its slice index and wait for everything to finish - that matches your Java example.
But when it doesn't fit in memory, that means you have unbounded buffer growth that might OOM.
PS. I realize you present even better solution; still, first version seems like a thing nice enough to have in a toolbox
Typically function "f" does two things. 1. Performs calculations (this can be parallelized). 2. Writes results somewhere (this must happen sequentially and in the correct order).
Here's the typical OrderedLoop usage example from the article:
Without the "canWrite" the entire body of the "f" will be executed either concurrently or sequentially. With it - we can have this dual behavior, similar to critical sections protected by mutexes.It's worth mentioning that OrderedLoop is a low-level primitive. User-level functions like OrderedMap, OrderedFilter and others do not need the "canWrite" channel to be exposed.
1. Reading from various files where each file has lines with a unique identifier I can use to process in order: I open all the files and create a min heap reading the first line of each, then process by grabbing the lowest from the min-heap repeatedly, after reading a line from a file, I read another and put it in the min-heap again (the min heap cells contain the opened file descriptor for that file)
2. Aggregating across goroutines that service data generators with different latencies and throughputs. I have a goroutine each that interfaces with them and consider them “producers”. Using a global atomic integer I can quickly assign a unique increasing index to the messages coming in, these can be serviced with a min-heap same as above. There are some considerations about dropping too old messages, so an alternative approach for some cases is to index the min-heap on received time and process only up to time.Now()-some buffering time to allow more time for things to settle before dropping things (trading total latency for this).
3. Similar to the above I have another scenario where throughput ingestion is more important and repeated processing happens in-order but there is no requirement on all messages to have been processed every time, just that they are processed in order (this is the backing for a log viewer). In this case I just slab allocate and dump what I receive without ordering concerns but I also keep a btree with the indexes that I iterate over when it’s time to process. I originally had this buffering like (2) to guarantee mostly ordered insertions in the slabs themselves (which I simply iterated on) but if a stall happened in a goroutine then shifting over the items in the slab when the old items came in became very expensive and could spiral badly.
Your first example definitely gives me merge-sort vibes - a really clean way to keep things ordered across multiple sources. The second and third scenarios are a bit beyond what I’ve tackled so far, but super interesting to read about.
This also reminded me of a WIP PR I drafted for rill (probably too niche, so I’m not sure I’ll ever merge it). It implements a channel buffer that behaves like a heap - basically a fixed-size priority queue where re-prioritization only happens for items that pile up due to backpressure. Maybe some of that code could be useful for your future use cases: https://github.com/destel/rill/pull/50
I am using those techniques respectively for loading backups (I store each container log in a separate file inside a big zip file, which allows concurrent reading without unpacking) and for servicing the various log producing goroutines (which use the docker/k8s apis as well as fsnotify for files) since I allow creating “views” of containers that consequently need to aggregate in order. The TUI itself, using tview, runs in a separate goroutine at configurable fps reading from these buffers.
I have things mostly working, the latest significant refactoring was introducing the btree based reading after noticing the “fix the order” stalls were too bad, and I am planning to do a show hn when I’m finished. It has been a lot of fun going back to solo-dev greenfield stuff after many years of architecture focused work.
I definitely love golang but despite being careful and having access to great tools like rr and dlv in goland, it can get difficult sometimes to debug deadlocks sometimes especially when mixing channels and locks. I have found this library quite useful to chase down deadlocks in some scenarios https://github.com/sasha-s/go-deadlock
Granted, that generally means they're doing something non-trivial with concurrency, and that correlates strongly with "has concurrency bugs". But I see issues FAR more frequently when they reach for channels rather than mutexes. It's bad enough that I just check absolutely every three-chan chunk of code proactively now.
I lay part of the blame on Go's "♥ safe and easy concurrency with channels! ♥" messaging. And another large chunk at the lack of generics (until recently), making abstracting these kinds of things extremely painful. Combined, you get "just do it by hand lol, it's easy / get good" programming, which is always a source of "fun".
Generics in particular are rather important here because without them, you are forced to build this kind of thing from scratch every time to retain type safety and performance, or give up and use reflection (more complicated, less safe, requires careful reading to figure out how to use because everything is an `interface{}`). This works, and Go's reflection is quite fast, but it's not a good experience for authors or users, so they're rather strongly incentivized to not build it / just do it by hand lol.
Now that we have a somewhat crippled version of generics, much of this can be solved in an ideal way: https://pkg.go.dev/slices works for everything and is fast, safe, easy to use, and reasonably easy to build. But there's a decade of inertia (with both existing code and community rejection of the concept) to turn around.
I can’t think of the last time I actually wrote code which directly created channels. Of course things like contexts, tickers, etc are implemented with channels and I think that is ideally how they should be used — in well defined and self contained library code.
Create a bunch of sequentially numbered jobs that then update their output into postgres database. Then have N number of workers process the jobs. Something like GCP's CloudTasks is perfect for this because the "workers" are just GCP Cloud Functions, so you can have a near infinite number of them (limited by concurrent DB connections).
This approach also buys you durability of the queue for free (ie: what happens when you need to stop your golang process mid queue?).
Then it is just a query:
I've just made a small but important clarification to the article. While in many cases it's easier and even preferred to calculate all results, accumulate them somewhere, then sort; this article focuses on memory bound algorithms that support infinite streams and backpressure.
Real-time Log Enrichment: perfect for my example [0], you're firing off endless tasks. RT logs have a timestamp.
Finding the First Match in a File List: Files tend to be static. I'd use a queue to first build an index and then a queue to process the index.
Time Series Data Processing: Break the data into chunks, you mention 600MB, which isn't that big at all given that Cloud Run memory maxes out at 32GB.
[0] https://news.ycombinator.com/item?id=45094387