Preserving Order in Concurrent Go Apps: Three Approaches Compared

Posted4 months agoActive4 months ago

destel

77 points

25 comments

destel.devTechstory

calmpositive

Debate

40/100

ConcurrencyGo Programming LanguageSoftware Development

Key topics

Concurrency

Go Programming Language

Software Development

The article compares three approaches to preserving order in concurrent Go applications, sparking a discussion on concurrency patterns and alternative solutions.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

Day 1

Avg / period

8.3

Comment distribution25 data points

Loading chart...

Based on 25 loaded comments

Key moments

01Story posted
Sep 1, 2025 at 2:14 AM EDT
4 months ago
Step 01
02First comment
Sep 1, 2025 at 9:57 AM EDT
8h after posting
Step 02
03Peak activity
21 comments in Day 1
Hottest window of the conversation
Step 03
04Latest activity
Sep 10, 2025 at 7:14 AM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (25 comments)

Showing 25 comments

destelAuthor

4 months ago

3 replies

Hi everyone, I’m the author of the article. Happy to answer any questions or discuss concurrency patterns in Go. Curious how others tackle such problems.

Traubenfuchs

4 months ago

3 replies

> Curious how others tackle such problems.

What do you think about the order preserving simplicity of Java?

  List<Input> inputs = ...;

  List<Output> results = inputs.parallelStream()
                             .map(this::processTask)
                             .collect(toList());

If you want more control or have more complex use cases, you can use an ExecutorService of your choice, handle the futures yourself or get creative with Javas new structured concurrency.

kamranjon

4 months ago

1 reply

Often in go I’ll create some data structure like a map to hold the new value keyed by the original index (basically a for loop with goroutines inside that close over the index value) - then I just reorder them after waiting for all of them to complete.

Is this basically what Java is doing?

I think that maybe the techniques in this article are a little more complex, allowing you to optimize further (basically continue working as soon as possible instead of just waiting for everything to complete and reordering after the fact) but I’d be curious to know if I’ve missed something.

gleenn

4 months ago

It's a reasonable solution. The problem with this solution is mentioned in the article, you necessarily have the worst case memory usage because you have to store everything in the map first. If you don't have too much to store, it will work.

destelAuthor

4 months ago

I haven’t used Java for about a decade, so I’m not very familiar with streams api.

Your snippet looks good and concise.

One thing I haven’t emphasized enough in the article is that all algorithms there are designed to work with potentially infinite streams

Groxx

4 months ago

Their planned semantics don't allow for that - there's no backpressure in that system, so it might race ahead and process up to e.g. item 100 while still working on item 1.

If everything fits in memory, that's completely fine. And then yeah, this is wildly overcomplicated, just use a waitgroup and a slice and write each result into its slice index and wait for everything to finish - that matches your Java example.

But when it doesn't fit in memory, that means you have unbounded buffer growth that might OOM.

kunley

4 months ago

1 reply

chan chan Foo seems like a cool trick, looking forward to use it in the code; thanks for the idea.

PS. I realize you present even better solution; still, first version seems like a thing nice enough to have in a toolbox

destelAuthor

4 months ago

Thanks. This replyTo pattern is very similar to promises in other languages.

gabesullice

4 months ago

1 reply

I might be too late for you to see this, but I'm curious why your final example requires the function "f" to receive the canWrite channel. I must be missing something. Couldn't the Ordered map signature be:

  func OrderedLoop[A, B any](in <-chan A, done chan<- B, n int, f func(a A))

Instead of:

  func OrderedLoop[A, B any](in <-chan A, done chan<- B, n int, f func(a A, canWrite <-chan struct{}))

Or is there a reason that "f" can't be wrapped in an anonymous function to handle the blocking logic?

destelAuthor

4 months ago

You're not late. Just yesterday I've configured new reply notifications via RSS.

Typically function "f" does two things. 1. Performs calculations (this can be parallelized). 2. Writes results somewhere (this must happen sequentially and in the correct order).

Here's the typical OrderedLoop usage example from the article:

  OrderedLoop(in, out, n, func(a A, canWrite <-chan struct{}) {
   // [Do processing here]
   
   // Everything above this line is executed concurrently,
   // everything below it is executed sequentially and in order
   <-canWrite
   
   // [Write results somewhere]
  })

Without the "canWrite" the entire body of the "f" will be executed either concurrently or sequentially. With it - we can have this dual behavior, similar to critical sections protected by mutexes.

It's worth mentioning that OrderedLoop is a low-level primitive. User-level functions like OrderedMap, OrderedFilter and others do not need the "canWrite" channel to be exposed.

abtinf

4 months ago

1 reply

Another scenario where order matters is in Temporal workflows. Temporal’s replay capability requires deterministic execution.

Groxx

4 months ago

That's a rather special case: they and Cadence control when calls into their code unblocks, and they use that to run your code as if it was a single-threaded event loop. That way, the stuff they do can be deterministic while simulating parallel execution (but only concurrency).

tetraodonpuffer

4 months ago

1 reply

Thanks for the write up! In my current application I have a few different scenarios that are a bit different from yours but still require processing aggregated data in order

1. Reading from various files where each file has lines with a unique identifier I can use to process in order: I open all the files and create a min heap reading the first line of each, then process by grabbing the lowest from the min-heap repeatedly, after reading a line from a file, I read another and put it in the min-heap again (the min heap cells contain the opened file descriptor for that file)

2. Aggregating across goroutines that service data generators with different latencies and throughputs. I have a goroutine each that interfaces with them and consider them “producers”. Using a global atomic integer I can quickly assign a unique increasing index to the messages coming in, these can be serviced with a min-heap same as above. There are some considerations about dropping too old messages, so an alternative approach for some cases is to index the min-heap on received time and process only up to time.Now()-some buffering time to allow more time for things to settle before dropping things (trading total latency for this).

3. Similar to the above I have another scenario where throughput ingestion is more important and repeated processing happens in-order but there is no requirement on all messages to have been processed every time, just that they are processed in order (this is the backing for a log viewer). In this case I just slab allocate and dump what I receive without ordering concerns but I also keep a btree with the indexes that I iterate over when it’s time to process. I originally had this buffering like (2) to guarantee mostly ordered insertions in the slabs themselves (which I simply iterated on) but if a stall happened in a goroutine then shifting over the items in the slab when the old items came in became very expensive and could spiral badly.

destelAuthor

4 months ago

1 reply

Wow, that’s some seriously sophisticated stuff - it’s not that often you see a heap used in typical production code (outside of libraries)!

Your first example definitely gives me merge-sort vibes - a really clean way to keep things ordered across multiple sources. The second and third scenarios are a bit beyond what I’ve tackled so far, but super interesting to read about.

This also reminded me of a WIP PR I drafted for rill (probably too niche, so I’m not sure I’ll ever merge it). It implements a channel buffer that behaves like a heap - basically a fixed-size priority queue where re-prioritization only happens for items that pile up due to backpressure. Maybe some of that code could be useful for your future use cases: https://github.com/destel/rill/pull/50

tetraodonpuffer

4 months ago

Hah not sure about “production”, I am currently in between jobs and am taking advantage of that to work on a docker/k8s/file TUI log viewer.

I am using those techniques respectively for loading backups (I store each container log in a separate file inside a big zip file, which allows concurrent reading without unpacking) and for servicing the various log producing goroutines (which use the docker/k8s apis as well as fsnotify for files) since I allow creating “views” of containers that consequently need to aggregate in order. The TUI itself, using tview, runs in a separate goroutine at configurable fps reading from these buffers.

I have things mostly working, the latest significant refactoring was introducing the btree based reading after noticing the “fix the order” stalls were too bad, and I am planning to do a show hn when I’m finished. It has been a lot of fun going back to solo-dev greenfield stuff after many years of architecture focused work.

I definitely love golang but despite being careful and having access to great tools like rr and dlv in goland, it can get difficult sometimes to debug deadlocks sometimes especially when mixing channels and locks. I have found this library quite useful to chase down deadlocks in some scenarios https://github.com/sasha-s/go-deadlock

candiddevmike

4 months ago

3 replies

Personally, I've come to really hate channels in Go. They are a source of some seriously heinous deadlock bugs that are really hard to debug, and closing channels in the wrong spot can crash your entire app. I try using plain locks until it hurts before I reach for channels these days.

Groxx

4 months ago

1 reply

Well over half of the code I've ever seen that uses three or more channels (i.e. two semantic ones plus a cancellation or shutdown) has had serious flaws in it.

Granted, that generally means they're doing something non-trivial with concurrency, and that correlates strongly with "has concurrency bugs". But I see issues FAR more frequently when they reach for channels rather than mutexes. It's bad enough that I just check absolutely every three-chan chunk of code proactively now.

I lay part of the blame on Go's "♥ safe and easy concurrency with channels! ♥" messaging. And another large chunk at the lack of generics (until recently), making abstracting these kinds of things extremely painful. Combined, you get "just do it by hand lol, it's easy / get good" programming, which is always a source of "fun".

ifoxhz

4 months ago

1 reply

I completely agree with your point. I also strongly dislike this programming model. However, are there better handling mechanisms or well-established libraries for managing concurrency and synchronization in Go? Previously, when I used C, I relied heavily on libraries like libuv to handle similar issues.

Groxx

4 months ago

There are some (the article's author builds a pretty sophisticated one: https://github.com/destel/rill ), but the ecosystem spent a very long time vilifying abstraction and generics and we're going to be paying that price for another decade at least. Possibly forever.

Generics in particular are rather important here because without them, you are forced to build this kind of thing from scratch every time to retain type safety and performance, or give up and use reflection (more complicated, less safe, requires careful reading to figure out how to use because everything is an `interface{}`). This works, and Go's reflection is quite fast, but it's not a good experience for authors or users, so they're rather strongly incentivized to not build it / just do it by hand lol.

Now that we have a somewhat crippled version of generics, much of this can be solved in an ideal way: https://pkg.go.dev/slices works for everything and is fast, safe, easy to use, and reasonably easy to build. But there's a decade of inertia (with both existing code and community rejection of the concept) to turn around.

__turbobrew__

4 months ago

Agreed. I especially think it was common to overuse channels when golang was younger as that was “the go way”. I think people have started to realize that channels are complex and a sharp abstraction and they should not be used frivolously.

I can’t think of the last time I actually wrote code which directly created channels. Of course things like contexts, tickers, etc are implemented with channels and I think that is ideally how they should be used — in well defined and self contained library code.

kunley

4 months ago

Totally different perspective here. Never dissapointed with channels, can't stand async.

latchkey

4 months ago

1 reply

For something like this, I would instinctively reach for an external queue mechanism instead of trying to work through the complexity of golangs concurrency.

Create a bunch of sequentially numbered jobs that then update their output into postgres database. Then have N number of workers process the jobs. Something like GCP's CloudTasks is perfect for this because the "workers" are just GCP Cloud Functions, so you can have a near infinite number of them (limited by concurrent DB connections).

This approach also buys you durability of the queue for free (ie: what happens when you need to stop your golang process mid queue?).

Then it is just a query:

  select * from finished_jobs order by job_num;

destelAuthor

4 months ago

I've just made a small but important clarification to the article. While in many cases it's easier and even preferred to calculate all results, accumulate them somewhere, then sort; this article focuses on memory bound algorithms that support infinite streams and backpressure.

destelAuthor

4 months ago

1 reply

UPD.

latchkey

4 months ago

Thanks, but I'd still use a queue over this solution.

Real-time Log Enrichment: perfect for my example [0], you're firing off endless tasks. RT logs have a timestamp.

Finding the First Match in a File List: Files tend to be static. I'd use a queue to first build an index and then a queue to process the index.

Time Series Data Processing: Break the data into chunks, you mention 600MB, which isn't that big at all given that Cloud Run memory maxes out at 32GB.

[0] https://news.ycombinator.com/item?id=45094387

View full discussion on Hacker News

ID: 45089938Type: storyLast synced: 11/20/2025, 2:30:18 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN