What Unix Pipelines Got Right and How We Can Do Better
Posted3 months agoActive3 months ago
programmingsimplicity.substack.comTechstory
calmpositive
Debate
40/100
Unix PipelinesProgramming ParadigmsData Processing
Key topics
Unix Pipelines
Programming Paradigms
Data Processing
The article discusses the strengths of Unix pipelines and proposes ways to improve upon them, sparking a discussion on the evolution of programming paradigms and data processing techniques.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
-4166036s
Peak period
13
0-1h
Avg / period
6.3
Comment distribution19 data points
Loading chart...
Based on 19 loaded comments
Key moments
- 01Story posted
Oct 19, 2025 at 3:40 PM EDT
3 months ago
Step 01 - 02First comment
Sep 1, 2025 at 10:26 AM EDT
-4166036s after posting
Step 02 - 03Peak activity
13 comments in 0-1h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 19, 2025 at 7:13 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45637242Type: storyLast synced: 11/20/2025, 1:42:01 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I think IPC via HTTP, gRPC, Kafka, files, etc allows language decoupling pretty well. Intra-process communication is primarily single-language, though you can generally call from language X into C-language libs. Cross-process, I don't see where the assertion comes from.
It will certainly do that if the buffer is full.
prevents the implicit blocking
No, that's exactly the case of implicit blocking mentioned above.
Does anyone else find this article rather AI-ish? The extreme verbosity and repetitiveness, the use of dashes, and "The limitation isn't conceptual—it's syntactic" are notable artifacts.
You can consider that an OS/resource specific limitation, rather than a limitation in the concept.
After reading the whole thing, yes! Specifically it feels incoherent in the way AI text often is. It starts by praising unix pipes for their simple design and the explicit tradeoffs they make, and then proceeds explaining how we could and should make the complete opposite set of tradeoffs.
Fanout has precisely zero dependency on GC. For example ‘tee’ has been around for decades and it can copy io streams just fine.
There has been some effort to built fanout shells too. With a discussion in HN earlier this month on one called dgsh https://news.ycombinator.com/item?id=45425298
Edit: I agree with other comments that this feels like AI slop
But without a common runtime the closest you could really get to that in Unix would be to pass JSON or XML about, and have every program have a "pipe" mode that accepted that as input.
Which seems like an awful lot of work and unlikely to get the kind of buy in you'd need to make it work widely.
Also viewing Unix pipes as some special class of file descriptor because your Intro to OS professor didn't teach you anything more sophisticated than shell pipe syntax is kinda dumb.
File descriptor-based IPC has none of the restrictions discussed in this article. They're not restricted to text (and the author does point this out), they're not restricted to linear topologies, they work perfectly fine in parallel environments (I have no idea what this section is talking about), and in Unix-land processes and threads are identically "heavy" (Windows is different).
For instance sqrt(sin(cos(theta))) can be notated < theta | cos | sin | sqrt.
Pipeline syntax implemented in functional languages expands into chained function invocation.
Everything follows from that: what we know about combining functions applies to pipes.
> When cat writes to stdout, it doesn't block waiting for grep to process that data.
That says nothing more than that nested function invocations admit non-strict evaluation strategies. E.g. the argument of a function need not be reduced to a value before it is passed to another, which can proceed with a calculation which depends on that result before obtaining it.
When you expand the actual data dependencies into a tree, it's obvious to see what can be done in parallel.