Building a CI/CD Pipeline Runner from Scratch in Python
Mood
thoughtful
Sentiment
positive
Category
tech
Key topics
CI/CD
Python
DevOps
Software Development
The author shares their experience building a CI/CD pipeline runner from scratch in Python, providing insights into the process and implementation details.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
3d
Peak period
17
Day 4
Avg / period
5.8
Based on 23 loaded comments
Key moments
- 01Story posted
11/9/2025, 7:26:29 PM
9d ago
Step 01 - 02First comment
11/12/2025, 5:52:46 PM
3d after posting
Step 02 - 03Peak activity
17 comments in Day 4
Hottest window of the conversation
Step 03 - 04Latest activity
11/14/2025, 9:55:24 PM
4d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
I have another custom flow implementation that I find more ergonomic: https://hofstadter.io/getting-started/task-engine/
Argo Workflows does not live up to what they advertise, it is so much more complex to setup and then build workflows for. Helm + Argo is pain (both use the same template delimiters...)
1. There is no central database to coordinate things. Rather it tries to manage serialization of important bits to/from XML for a lot of things, for a lot of concurrent processes. If you ever think you can manage concurrency better than MySQL/Postgres, you should examine your assumptions.
2. In part because of the dance-of-the-XMLs, when a lot of things are running at the same time Jenkins starts to come to a crawl, so you are limited on the number of worker nodes. At my last company that used Jenkins they instituted rules to keep below 100 worker nodes (and usually less than that) per Jenkins. This lead to fleets of Jenkins servers (and even a Jenkins server to build Jenkins servers as a service), and lots of wasted time for worker nodes.
3. "Everything is a plugin" sounds great, but it winds up with lots of plugins that don't necessarily work with each other, often in subtle ways. In the community this wound up with blessed sets of plugins that most people used, and then you gambled with a few others you felt you needed. Part of this problem is the choice of XMLs-as-database, but it goes farther than that.
4. The way the server/client protocol works is to ship serialized Java processes to the client, which then runs it, and reserializes the process to ship back at the end. This is rather than having something like RPC. This winds up being very fragile (e.g.: communications breaks were a constant problem), makes troubleshooting a pain, and prevents you from doing things like restarting the node in the middle of a job (so you usually have Jenkins work on a Launchpad, and have a separate device-under-test).
Some of these could be worked on, but there seemed to be no desire in the community to make the large changes that would be required. In fact there seemed to be pride in all of these decisions, as if they were bold ideas that somehow made things better.
If you are talking about Jenkins-X, that is a different story, it's basically a rewrite to Kubernetes. I haven't talked to anyone actually using it, if you go k8s, you are far more likely to go argo
IMO, ci should be running the same commands humans would run (or could if it is production settings). Thus our Jenkins pipelines became a bunch of DSL boilerplate wrapped around make commands. The other nice thing about this is that it prepares you for easier migrations to a new ci system
It's one of the few CI tools where you can test your pipeline without committing it. You also have controls such as only pulling the pipeline from trunk, again, something that wasn't always available elsewhere.
However, it can also be a complete footgun if you're not fairly savvy. Pipeline security isn't something every developer groks.
> Build a dependency graph (which jobs need which other jobs)
> Execute jobs in topological order (respecting dependencies)
For what it’s worth, Python has graphlib.TopologicalSorter in the standard library that can do this, including grouping tasks that can be run in parallel:
Care to elaborate? If you already deploy in docker then wouldn't this be nice?
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.