Dgsh – Directed Graph Shell
Posted3 months agoActive3 months ago
www2.dmst.aueb.grTechstoryHigh profile
calmmixed
Debate
40/100
Shell ScriptingData ProcessingGraph Theory
Key topics
Shell Scripting
Data Processing
Graph Theory
The Directed Graph Shell (dgsh) is a shell that allows users to create data processing pipelines as directed acyclic graphs, sparking discussion on its benefits, comparison to other tools, and potential applications.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
40
0-6h
Avg / period
8.6
Comment distribution60 data points
Loading chart...
Based on 60 loaded comments
Key moments
- 01Story posted
Sep 30, 2025 at 9:39 AM EDT
3 months ago
Step 01 - 02First comment
Sep 30, 2025 at 11:10 AM EDT
2h after posting
Step 02 - 03Peak activity
40 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 2, 2025 at 2:17 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45425298Type: storyLast synced: 11/20/2025, 5:45:28 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://www.buzzfeednews.com/article/nicolenguyen/slack-new-...
In English, this makes me think of the phrase "dig shell". I guess we just have different things on our minds...
:-p
https://airflow.apache.org/
https://www.knime.com/
which have their own subculture. You could solve the same problems they do with pandas and scikit-learn but people who use those tools would never use pandas and scikit-learn and vice versa.
Circa 2015 I was thinking those tools all had the architectural flaw that they pass relational rows over the lines as opposed to JSON objects (or equivalent) which means you had to realize joins as highly complex graphs where things that seem like local concerns to me require a global structure and where what seems like a little change to management changes the whole graph in a big way.
I found the people who were buying up that sort of tools didn’t give a damn because they thought customers demanded the speed of columnar execution which our way couldn’t deliver.
I made a prototype that gave the right answers every time and then went to work for a place which had some luck selling their own version that didn’t always give the right answers because: they didn’t know what algebra it supported, didn’t believe something like that had an algebra, and didn’t properly tear the pipeline down at the end.
There are probably libraries that could help, but then you need to install dependencies which is sad in python for other reasons
Others use nextflow but that requires learning Groovy and it's less intuitive.
The upgrade was a nightmare for so many organizations. It shouldn't be that way but it was.
I.e. much faster to use dgsh for a basic processing DAG, much more painful to use dgsh for a large ETL pipeline.
Python with something like Prefect isn't something you'd use a REPL to bang out a one-off on, but it'd be more maintainable. dgsh would let you use a REPL to bang out a quick and dirty DAG.
Even creating tools in Python that can be connected together in a Unix shell pipeline isn't trivial. By default if a downstream program stops processing Python's output you get an unsightly broken pipe exception, so you need to execute signal.signal(signal.SIGPIPE, signal.SIG_DFL) to avoid this.
I’m on my phone at the moment and cooking so cannot type any examples, but if I get time, I’ll throw together some comparisons later tonight
However Murex does support CSP-style concurrency. So while there’s no syntax sugar for writing graphs, you can very easily create adhoc pipes and pass them around instead of using stdout / stderr.
So it wouldn’t actually take much to refine that with some DAG-friendly syntax.
In fact maybe that can be my next project…
Looking properly at this, I can see no iteration is needed. Which actually makes the Murex implementation even easier because Murex already has tee pipes just like dgsh. It’s just not (yet) particularly well documented.
Stay tuned though, What I’m going to do is write a blog post about it. It’s an interesting enough topic to deserve one
What syntax would you propose?
I would suggest a familiar notation like "[a, b] -> c" in a dedicated dag block:
https://www2.dmst.aueb.gr/dds/sw/dgsh/#text-propertiesor
https://www2.dmst.aueb.gr/dds/sw/dgsh/#committer-plotThe translations above are computer-assisted and may contain mistakes, but you get the idea.
having dgsh output a graphvis file in dry-run mode would be a neat feature.
awk -F\; ' $2 > max[$1] { max[$1] = $2 } !($1 in min) || $2 < min[$1] { min[$1] = $2 } { sum[$1] += $2; count[$1]++} END { for (n in sum) printf("%s=%.1f/%.1f/%.1f, ", n, min[n], sum[n] / count[n], max[n])}'
Can't see how dgsh could be applied to it.
Dgsh – Directed Graph Shell - https://news.ycombinator.com/item?id=21700014 - Dec 2019 (11 comments)
Dgsh – Directed graph shell - https://news.ycombinator.com/item?id=13352659 - Jan 2017 (51 comments)