Welcome to Gas Town

10 days ago

3 replies

The article seems to be about fun, which I'm all for, and I highly appreciate the usage of MAKER as an evaluation task (finally, people are actually evaluating their theories on something quantitative) but the messaging here seems inherently contradictory:

> Gas Town helps with all that yak shaving, and lets you focus on what your Claude Codes are working on.

Then:

> Working effectively in Gas Town involves committing to vibe coding. Work becomes fluid, an uncountable that you sling around freely, like slopping shiny fish into wooden barrels at the docks. Most work gets done; some work gets lost. Fish fall out of the barrel. Some escape back to sea, or get stepped on. More fish will come. The focus is throughput: creation and correction at the speed of thought.

I see -- so where exactly is my focus supposed to sit?

As someone who sits comfortably in the "Stage 8" category that this article defines, my concern has never been throughput, it has always been about retaining a high-degree of quality while organizing work so that, when context switching occurs, it transitions me to near-orthogonal tasks which are easy to remember so I can give high-quality feedback before switching again.

For instance, I know Project A -- these are the concerns of Project A. I know Project B -- these are the concerns of Project B. I have the insight to design these projects so they compose, so I don't have to keep track of a hundred parallel issues in a mono Project C.

On each of those projects, run a single agent -- with review gates for 2-3 independent agents (fresh context, different models! Codex and Gemini). Use a loop, let the agents go back and forth.

This works and actually gets shit done. I'm not convinced that 20 Claudes or massively parallel worktrees or whatever improves on quality, because, indeed, I always have to intervene at some point. The blocker for me is not throughput, it's me -- a human being -- my focus, and the random points of intervention which ... by definition ... occur stochastically (because agents).

Finally:

> Opus 4.5 can handle any reasonably sized task, so your job is to make tasks for it. That’s it.

This is laughably not true, for anyone who has used Opus 4.5 for non-trivial tasks. Claude Code constantly gives up early, corrupts itself with self-bias, the list goes on and on. It's getting better, but it's not that good.

iamwil

10 days ago

1 reply

[delayed]

https://www.wired.com/story/london-bitcoin-pub/

10 days ago

1 reply

I've tried most of the agentic "let it rip" tools. Quickly I realized that GPT 5~ was significantly better at reasoning and more exhaustive than Claude Code (Opus, RL finetuned for Claude Code).

"What if Opus wrote the code, and GPT 5~ reviewed it?" I started evaluating this question, and started to get higher quality results and better control of complexity.

I could also trust this process to a greater degree than my previous process of trying to drive Opus, look at the code myself, try and drive Opus again, etc. Codex was catching bugs I would not catch with the same amount of time, including bugs in hard math, etc -- so I started having a great degree of trust in its reasoning capabilities.

I've codified this workflow into a plugin which I've started developing recently: https://github.com/evil-mind-evil-sword/idle

It's a Claude Code plugin -- it combines the "don't let Claude stop until condition" (Stop hook) with a few CLI tools to induce (what the article calls) review gates: Claude will work indefinitely until the reviewer is satisfied.

In this case, the reviewer is a fresh Opus subagent which can invoke and discuss with Codex and Gemini.

One perspective I have which relates to this article is that the thing one wants to optimize for is minimizing the error per unit of work. If you have a dynamic programming style orchestration pattern for agents, you want the thing that solves the small unit of work (a task) to have as low error as possible, or else I suspect the error compounds quickly with these stochastic systems.

I'm trying this stuff for fairly advanced work (in a PhD), so I'm dogfooding ideas (like the ones presented in this article) in complex settings. I think there is still a lot of room to learn here.

mlady

9 days ago

I'm sure we're just working with the same tools thinking through the same ideas. Just curious if you've seen my newsletter/channel @enterprisevibecode https://www.enterprisevibecode.com/p/let-it-rip

It's cool to see others thinking the same thing!

anthonypasq

6d ago

2 replies

a response like this is confusing to me. what you are saying makes sense, but seems irrelevant. something like gas town is clearly not attempting to be a production grade tool. its an opinionated glimpse into the future. i think the astethic was fitting and intentional.

this is the equivalent of some crazy inventor in the 19th century strapping a steam engine onto a unicycle and telling you that some day youll be able to go 100mph on a bike. He was right in the end, but no one is actually going to build something usable with current technology.

Opus 4.5 isnt there. But will there be a model in 3-5 years thats smart enough, fast enough, and cheap enough for a refined vision of this to be possible? Im going to bet on yes to that question.

leftbehinds

6d ago

1 reply

in 3-5years, sure, just like we are all currently using crypto to pay for groceries and smart contracts for all legal matters.

anthonypasq

6d ago

4 replies

... no one ever used crypto to buy things. most engineers are currently already using AI. such a dumb comparison that really just doesnt pass the sniff test.

jbl0ndie

5d ago

2 replies

Not quite true. This pub's changed hands now but it was possible to pay in bitcoin for several years.

benregenspan

5d ago

1 reply

For anyone who takes doing their taxes seriously, this is a nightmare. Every pint ordered involves a capital gain (or loss) for the buyer. At a certain point you're doing enough accounting that you might as well be running the bar yourself (or just paying in cash)!

leipert

5d ago

Depends. If you hold crypto for more than a year in Germany, gains are tax free.

adw

5d ago

Inside scoop: the pub group who owned that pub (still going, owns four in Cambridge and environs) was cofounded by Steve Early, a Cambridge computer scientist who wrote his own POS software, so it was very much a case of "yeah, that sounds like fun, I'll add it". (Until tax and primary rate risk made it not fun, so it was removed.)

The POS software's on GitHub: https://github.com/sde1000/quicktill

Quarrelsome

5d ago

people use crypto to buy black market goods like drugs. Its incredibly reliable to buy drugs with.

dzdt

5d ago

People use crypto all the time to buy dollars. Thats its main purpose: spend sanctioned rubles to buy crypto to buy dollars; use randomware to coersively obtain crpyto to buy dollars, etc.

6d ago

Their green username is leftbehinds. Let them hold their wrong opinions based on outdated information.

5d ago

1 reply

I think this read is generous:

> something like gas town is clearly not attempting to be a production grade tool.

Compare to the first two sentences:

> Gas Town is a new take on the IDE for 2026. Gas Town helps you with the tedium of running lots of Claude Code instances. Stuff gets lost, it’s hard to track who’s doing what, etc. Gas Town helps with all that yak shaving, and lets you focus on what your Claude Codes are working on.

Compared to your read, my read is confused: is it or is it not intending to be a useful tool (we can debate "production" quality, here I'm just thinking something I'd actually use meaningfully -- like Claude Code)?

I think the author wants us to take this post seriously, so I'm taking it seriously, and my critique in the original post was a serious reaction.

alexjurkiewicz

5d ago

The blog post says, many times, not to use Gastown. It makes fun of the tool's inconsistent branding and describes a lot of jankiness.

This tool is dangerous, largely untested, and yet may be of interest if you are already doing similar things in production.

andrewl-hn

5d ago

1 reply

Meanwhile here I am at stage 0. I work on several projects where we are contractually obliged to not use any AI tools, even self-hosted ones. And AFAIK there's now a growing niche of mostly government projects with strict no-AI policy.

5d ago

I’m luckily in a situation where I can afford to explore this stuff without the concerns that come from using it within an organization (and those concerns are 100% valid and haven’t been solved yet, especially not by this blog post)

qcnguy

10 days ago

9 replies

This is clearly going to develop the same problem Beads has. I've used it. I'm in stage 7. Beads is a good idea with a bad implementation. It's not a designed product in the sense we are used to, it's more like a stream of consciousness converted directly into code. There are many features that overlap significantly, strange bugs, and the docs are also AI generated so have fun reading them. It's a program that isn't only vibe coded, it was vibe designed too.

Gas Town is clearly the same thing multiplied by ten thousand. The number of overlapping and adhoc concepts in this design is overwhelming. Steve is ahead of his time but we aren't going to end up using this stuff. Instead a few of the core insights will get incorporated into other agents in a simpler but no less effective way.

And anyway the big problem is accountability. The reason everyone makes a face when Steve preaches agent orchestration is that he must be in an unusual social situation. Gas Town sounds fun if you are accountable to nobody: not for code quality, design coherence or inferencing costs. The rest of us are accountable for at least the first two and even in corporate scenarios where there is a blank check for tokens, that can't last. So the bottleneck is going to be how fast humans can review code and agree to take responsibility for it. Meaning, if it's crap code with embarrassing bugs then that goes on your EOY perf review. Lots of parallel agents can't solve that fundamental bottleneck.

ac29

9 days ago

1 reply

>This is clearly going to develop the same problem Beads has. I've used it. I'm in stage 7. Beads is a good idea with a bad implementation. It's not a designed product in the sense we are used to, it's more like a stream of consciousness converted directly into code. There are many features that overlap significantly, strange bugs, and the docs are also AI generated so have fun reading them. It's a program that isn't only vibe coded, it was vibe designed too.

Yeah this describes my feeling on beads too. I actually really like the idea - a lightweight task/issue tracker integrated with a coding agent does seem more useful than a pile of markdown todos/plans/etc. But it just doesnt work that well. Its really buggy and the bugs seem to confuse the agent since it was given instructions to do things a certain way that dont work consistently.

laxori666

9 days ago

1 reply

I tried using beads. There kept being merge conflicts and the agent just kept one or the other changes instead of merging it intelligently, killing any work I did on making tasks or resolving others. Still haven't seen how beads solves this problem... and it's also an unnecessary one. This should be a separate piece of it that doesn't rely on agent not funging up the merge.

sbarre

5d ago

2 replies

How long until Atlassian makes "JIRA for Agents" where all your tasks and updates and memory aren't stored in Git (so no merge conflicts) but are still centralized and shareable between all your agents/devs/teams..

And also auditable, trackable, reportable, etc..

5d ago

1 reply

you've already hooked up Jira access via MCP/skills jira-cli, yeah?

sbarre

5d ago

Oh I don't use JIRA currently, but I used it at a previous job.

I was sort of kidding with "JIRA for Agents", obviously using the API and existing tool you can make agents use it.

We use Github at my current job and similarly have Claude Code update issues and PRs when it does work.

baq

5d ago

they already did, it's called 'JIRA'. not even joking.

9 days ago

1 reply

There are no concepts in this blog post. It is the author's opinions in the form of a pseudo-Erlang program with probabilities. If one reads it like it is a program, you realize that the underlying core has been obfuscated by implementation details.

I'm looking for "the Emacs" of whatever this is, and I haven't read a blog post which isolates the design yet.

leftbehinds

6d ago

excellent summary, thanks

5d ago

2 replies

Huh. I was going to try it, but maybe I need to vibe-code my own agent issue-tracker.

Or did you find one that's good?

xrd

5d ago

1 reply

Why can't you use an issue tracker that is built into the git control, like forgejo? It feels like it would be easy to use this with an API key, out or direct database access (I'm doing both with agents). If you self host, you've got a very standard and reliable issue tracker. Why does beads need to exist? What can't an agent do with that setup I've described above?

5d ago

I believe the idea is that it's for managing many fine-grained todo's to keep the agents on track. When multiple code agents are working at the same time and when there's a merge conflict, the code agents can do the merges, too.

But yeah, I'm only running one code agent at a time, so that's not a problem I have. I should probably start with just a todo list as plain text.

jbgt

5d ago

Look at cc-mirror, and use the mclaude there.

It unlocks a (still) hidden multiagent orchestration function in Claude code. The person making it unminified the code and figured out how to unlock it.

I find it quite well done - I started a orchestrator project a few days ago and scrapped it because it'll be fully integrated soon it seems.

nikvdp

5d ago

1 reply

You might like linear-beads[1] better. It's a simpler and less invasive version of beads I made to solve some of the unusual design choices. It can also (optionally) use linear as the storage backend for the agent's tasks, which has the excellent side effect that you as a human can actually see what the agent is working on and direct the agent from within linear.

Despite it's quirks I think beads is going to go down as one of the first pieces of software that got some adoption where the end user is an agent

[1]: https://github.com/nikvdp/linear-beads

5d ago

1 reply

Looks interesting. Might want to link to Linear in the README; I hadn't heard of it before.

What do you like about Linear? Is it suitable for hobby projects?

nikvdp

5d ago

Good call, added a link.

Linear is great, it's what JIRA should've been. Basically task management for people who don't want to deal with task management. It's also full featured, fast (they were famously one of the earlier apps to use a local-first sync-engine style architecture), and keyboard-centric.

Definitely suitable for hobby projects, but can also scale to large teams and massive codebases.

NamlchakKhandro

9 days ago

100%.

There's a lot of strange things going on in that project.

try to add some common sense, and you'll get shouted out.

which is fine, I'll just make my own version without the slop.

danpalmer

5d ago

> Beads is a good idea with a bad implementation

> Course, I’ve never looked at Beads either, and it’s 225k lines of Go code that tens of thousands of people are using every day. I just created it in October. If that makes you uncomfortable, get out now.

dgunay

5d ago

I am wondering if it would be a viable strategy to vibe code almost "in reverse" - take a giant ball of slop such as beads, and use agents to strip away feature after feature until you are left with only exactly what you need, streamlined to your exact workflow. Maybe it'd be faster to just start from scratch, but it might be an interesting experiment. Most of my struggles with using beads so far have come from being off the #1 use case of too many of its features, and having to slog through too much documentation to know what to actually use.

pjc50

5d ago

> And anyway the big problem is accountability.

It's 2025, accountability is a thing of the past. The future belongs to the unaccountable and their AI swarm.

Facebook burned something like $70bn on "metaverse" with seemingly zero results. There's a lot more capital (and biosphere) to burn on AI agents.

5d ago

Re Beads - doesn't seem to have been covered a lot on HN but I found these. Others?

Show HN: I replaced Beads with a faster, simpler Markdown-based task tracker - https://news.ycombinator.com/item?id=46487580 - Jan 2026 (2 comments) (<-- I've put this one in the SCP - see https://news.ycombinator.com/item?id=26998308 for explanation)

Solving Agent Context Loss: A Beads and Claude Code Workflow for Large Features - https://news.ycombinator.com/item?id=46471286 - Jan 2026 (1 comment)

Beads – A memory upgrade for your coding agent - https://news.ycombinator.com/item?id=46075616 - Nov 2025 (68 comments)

Beads: A coding agent memory system - https://news.ycombinator.com/item?id=45566864 - Oct 2025 (1 comment)

6d ago

2 replies

(We re-upped this post because it hadn't made the frontpage despite lots of upvotes - which happens sometimes.

This explains why some of the comments have timestamps that appear older than the post itself. I got tired of trying to make them line up, sorry!)

toast0

6d ago

2 replies

> I got tired of trying to make them line up, sorry!

IMHO, it's less disorienting to have the post dated after the comments than it is to see a comment you thought you wrote a couple days ago but is dated today. So you're welcome to stop trying to line up timestamps.

6d ago

1 reply

The problem is that it leads to endless confusion that ends up taking over threads. We tried it!

Status quo sucks also, it just sucks less. Haven't yet figured out an actually good solution. Sorry!

5d ago

2 replies

How about recording two dates on the post, the original post date and the re-upped date, and then putting something like "14 hours ago; originally 2 days ago" in the post header?

simoncion

5d ago

1 reply

How about simply not tampering with the date on the post?

The most I imagine most folks saying is "Didn't I see this post on the front page days ago?". For many other discussion fora, it's not uncommon for posts to be at the top of the pile for many days... so a days-old post date should be nothing unusual.

5d ago

That's exactly what we tried and didn't work. The amount of noise it generated was vastly greater than the noise the status quo generates, unsatisfactory though it is.

gnabgib

5d ago

1 reply

You already get that, if you hover over the submission time (it'll show the original submission time - days ago, while displaying "1 hour ago")

5d ago

It seems too easy to miss.

Philpax

6d ago

Agreed, although perhaps not as strongly; I remember seeing these comments from a few days ago, so I was feeling a little gaslit seeing the new timestamps and wondered if I'd hallucinated the original thread.

mhitza

6d ago

1 reply

What happens sometimes, artifical uplifting of posts, posts with high vote count that don't reach the frontpage, or both?

6d ago

I meant the latter, but both!

Re artificial uplifting a.k.a. re-upping, see https://news.ycombinator.com/item?id=26998308 and https://news.ycombinator.com/pool

gneray

6d ago

1 reply

> But first, before we get into Gas Town’s operation, I need to get rid of you real quick.

WARNING DANGER CAUTION GET THE F** OUT YOU WILL DIE

I have never met Steve, but this warning alone is :chefskiss:

lukeinator42

5d ago

reminds me of this warning from Dante in the YARP documentation: https://www.yarp.it/latest/warning.html.

6d ago

3 replies

What's really amazing to me is this is how I've thought about building the same thing, by using beads... Glad someone in the hivemind did it.

22c

6d ago

1 reply

> I've thought about building the same thing, by using beads... Glad someone in the hivemind did it.

Gas Town is from the creator of beads.

6d ago

This makes even more sense, it definitely feels like a logical step with beads. I use Beads a lot, its one of the few things I use with Claude Code.

michaelbuckbee

6d ago

1 reply

Late to the party, would love to know more of your workflow with how you're using beads.

5d ago

1 reply

I use Zed (this is completely optional since claude code can work 100% stand alone), Claude Code (I have Max) and Beads. I also take advantage of the .claude/instructions.md file and let Claude know to ALWAYS use Beads, and to use rg instead of "grep" which is kind of slow (if anyone from Anthropic is reading this, for the love of GOD use ripgrep instead of grep), a small summary about the project, and some ground rules. If there's key tickets that matter for the project I tell it to note them in the instructions. The instructions files the first thing Claude reads when you first open up a chat window with it, if you make amendments ask it to reread the file.

Outside of that its trial and error, but I've learned you don't need to kick off a new chat instance very much if at all. I also like Beads because if I have to "run" or go offline I can tell it to pause and log where it left off / where its at.

For some projects I tell claude not to close tickets without my direct approval because sometimes it closes them without testing, my baseline across all projects is that it compiles and runs without major errors.

5d ago

Also, and I forgot this. I make them ALWAYS commit changes. Every single time, if this horrifies you, just remember you can always revert code, people need to stop getting scared of version control, use it to your full advantage.

sathish316

4d ago

I had the same thought of using beads to build a multi-agent orchestrator with a defined set of workflows.

But to keep things tractable, i've kept the orchestration within a collection of subagents in a single Claude code session. The orchestration system is called Pied-Piper and you can find the code here - https://github.com/sathish316/pied-piper

It is only 1.6k Lines of Go code.

lifetimerubyist

6d ago

2 replies

If babysitting 30 claude agents is the future of professional programming I want zero part of it.

5d ago

2 replies

I bet you type with a keyboard.

lifetimerubyist

5d ago

2 replies

My keyboard doesn't randomly input "banana" when I type "apple". If it did, it would be considered extremely defective.

troupo

5d ago

Unless you're on iOS. Then it's a shipped "feature".

5d ago

doesnt matter when you type 80k wpm

BalinKing

5d ago

Keyboards are highly deterministic. And when they're not, e.g. due to physical wear or software glitches, this makes them basically unusable for touch typists.

smileson2

5d ago

I was hoping this stuff would lead to a world with less software not more of it

vessenes

6d ago

3 replies

I put in 15 hours or so with gas town this weekend, from just around the 0.1 release.

Think of as an extended bipolar-optimism-fueled glimpse into the future. Steve's MO is laid out in the medium post - but basically, it's okay to lose things, rewrite whole subsystems, whatever, this is the future. It's really fun and interesting to watch the speed of development.

I've made a few multi agent coding setups in the last year, and I think gas town has the team side about right: big boss (mayor), operations boss (deacon), relatively linear keeper of truth (witness), single point for merges (refiner), lots of coders with their code held lightly.

I love the idea of formulas - a lot of what makes gas town work and informs how well it ultimately will work is the formulas. They're close conceptually to skills.

I don't love the mad max branding, but meh, whatever, it's fun, and a perk of the brave new world where you can make stuff like for a few hundred bucks a month sent to anthropic - software can have personality again, yay.

Conceptually I think there is a product team element to this still missing - deploy engineers, product managers, visual testing. Everything is sort of out there, janky in parts, but workable to glue together right now, and will only improve. That said, the mad max town analogy is going to get overstretched at some point; we already have pretty good names for all the parts that are needed, and as coordination improves, we're going to want to add more stuff into the coordination. So, I'd like to see a version of this with normal names and expanded.

Upshot - worth a look - if beads is any indication, give it a month or two or four to settle down unless you like living on the bleeding bleeding edge.

boredtofears

5d ago

1 reply

Did you get anything workable out of it? How much money did you end up burning? (If you don't mind me asking)

vessenes

5d ago

Yes, definitely. I spent about half that time poking around, understanding the setup, doing some bug fixing and put in a PR for gas town itself, although I used Claude Code separately for making the PR.

I pointed it at a Postgres time series project I was working on, and it deployed a much better UI and (with some nudging) fixed docker errors on a remote server, which involved logging in to the server to check logs. It probably opened and fixed 50 or so beads in total.

I'd reach for it first to do something complicated ("convoy" or epic) over Claude Code even as is -- like, e.g. "copy this data ingestion we do for site x, and implement it for sites y,z,,a,b,c,d. start with a formal architecture that respects our current one and remains extensible for all these sites" is something I think it would do a fair job at.

As to cost - I did not run out of my claude pro max subscription poking around with it. It infers ... a lot ... though. I pulled together a PR that would let you point some or all of the agent types at local or other endpoints, but it's a little early, I think for the codebase. I'd definitely reach for some cheaper and/or faster inference for some of the use cases.

RugnirViking

4d ago

How do you do the multi agent setups in containers? I keep trying to figure out ways to start with stuff like this but it always boils down to I don't want to give entirely autonomous agents access to my entire filesystem and/or github perms. I just want them to be able to hack away in their own container and produce a pr I can read or test. I think something like a local git with the remote in the container pointing at the version on the machine could be a start but setting all that up is not trivial. As far as I can tell Steve is just running everything on the base machine in multiple worktreees/multiple clones of the project - which seems to put enormous amounts of trust on agents to actually create branches in a disciplined way. I can't imagine they can be trusted to?

andrewl-hn

5d ago

As someone who never saw Mad Max, Slow Horses, Cat’s Cradle, Breaking Bad and only saw Waterworld when I was a kid all the references in this post went completely over my head, and I just think of words used in there as their own terminology. Like, if non-engineers read about chemical production.

The article was pretty Ok. Kubernetes has it's own share of obnoxious terminology that often comes up as "we name it different so that it doesn't sound like AWS". At some point you just accept the terminology in relation to the tool you use and move on.

hi_hi

6d ago

1 reply

Are many, many Agents going to produce better quality outputs than 1 Agent?

Assuming this isn't a parody project, maybe this just isn't for me, and thats fine. I'm struggling to understand a production use case where I'd be comfortable letting this thing loose.

Who is the intended audience for this design?

elzbardico

5d ago

No. It is going to produce a mindboggling amount of code in a very short amount of time, and hopefully in the process bad variants somehow would be pruned out as the horde of agents engage in an orgy of edits.

thesurlydev

5d ago

1 reply

I had a lot of fun reading the articles about Gas Town although I started to lose track of the odd naming. Only odd because they make sense to Steve and others who have seen the Mad Max, Water World movies.

I promptly gave Claude the text to the articles and had him rewrite using idiomatic distributed systems naming.

Fun times!

https://chatgpt.com/share/695c6216-e7a4-800d-b83d-fc1a22fd8a...

5d ago

1 reply

Care to share that with the rest of the class? I'd love to hear what those idiomatic distributed systems namings are!

dgunay

5d ago

1 reply

Ran it through ChatGPT:

  Town            = Central orchestrator / control plane
  Rig             = Project or workspace namespace
  Polecat         = Ephemeral worker job
  Refinery        = Merge queue manager
  Witness         = Worker health monitor
  Crew            = Persistent worker pool
  Beads           = Persistent work items / tasks
  Hooks           = Work queues / task slots
  GUPP            = Work processing guarantee
  Molecules/Wisps = Structured, persistent workflows
  Convoys         = Grouped feature work units

5d ago

1 reply

Thank you! Is this the future? Everyone gets to have their own cutesy translation of everything? If I want "kubectl apply" to have a Tron theme, while my coworker wants a Disney theme. Is the runbook going to be in Klingon if I'm fluent in that?

dgunay

1d ago

I hope not. Homebrew is a great example of why boring tools shouldn't invent quirky terminology.

bigwheels

5d ago

2 replies

I tried it out but despite what the README says (https://github.com/steveyegge/gastown), the mayor didn't create a convoy or anything, the mayor is just doing all the work itself. I don't understand how this is different from invoking `claude`.

Update: I was hoping it'd at least be smart enough to automatically test the project still builds but it did not. It also didn't commit it's work.

P.s. the choice of nomenclature is a bit odd, and makes it hard to follow what is what. Movie characters, dogs and raccoons, huh? How about striving for descriptive SWE clarity?

Quarrelsome

5d ago

> How about striving for descriptive SWE clarity?

that's what got us CQRS "command query responsibility segregation" which is technically correct word but absolutely fucking meaningless to anyone that doesn't know what it means already.

It should have been called "read here, write there" but noooooooOOOOOooooo we need descriptive SWE clarity so only people with CS degrees that know all the acronyms already can understand wtf is being said.

5d ago

The E part of SWE has left the building a long time ago

gensym

5d ago

1 reply

Gas Town is an appropriate name since using it is effectively pouring a tank of gasoline on a pile of money and lighting a match.

NoImmatureAdHom

5d ago

1 reply

Out of curiosity, how much money are we talking about?

tom_

5d ago

I am interested in this as well. From the article:

```Gas Town is also expensive as hell. You won’t like Gas Town if you ever have to think, even for a moment, about where money comes from. I had to get my second Claude Code account, finally; they don’t let you siphon unlimited dollars from a single account, so you need multiple emails and siphons, it’s all very silly. My calculations show that now that Gas Town has finally achieved liftoff, I will need a third Claude Code account by the end of next week. It is a cash guzzler.'''

Since I am quite capable of shitting my own code up for free, and I've got zero interest in this stupid AI shit anyway, I'm vanishingly unlikely to actually use this. But, still: I like to keep half an eye on what is going on, even if I hate it. And I am more than somewhat intrigued about what the numbers actually look like.

lostdog

5d ago

1 reply

There's a simpler design here begging to show itself.

We're trying to orchestrate a horde of agents. The workers (polecats?) are the main problem solvers. Now you need a top level agent (mayor) to breakdown the problem and delegate work, and then a merger to resolve conflicts in the resulting code (refinery). Sometimes agents get stuck and need encouragement.

The molecules stuff confused me, but I think they're just "policy docs," checklists to do common tasks.

But this is baby stuff. Only one level of hierarchy? Show me a design for your VP agent and I'll be impressed for real.

baq

5d ago

‘We need to offshore to a cheaper token provider’ after which the town burns down.

deepjoy

5d ago

1 reply

Not sure if anyone else noticed. The first commit on that repository was just about 3 weeks ago. https://github.com/steveyegge/gastown/commit/4c782bc59de8cba...

Has to be close for the shortest time from first commit to HN front page.

fulafel

3d ago

> Gas Town is only 17 days old, at least this version of it, the Go “port” of Python Gas Town.

yowlingcat

5d ago

2 replies

Someone here has lost the plot and at this point I wonder if it is me. Is software supposed to be deterministic anymore? Are incremental steps expected to be upgrades and not regressions? Is stability of behavior and dependability desirable? Should we culturally reward striving to get more done with less.

...no, I haven't lost the plot. I'm seeing another fad of the intoxicated parting with their money bending a useful tool into a golden hammer of a caricature. I dread seeing the eventual wreckage and self-realization from the inevitable hangover.

orangecat

5d ago

Is software supposed to be deterministic anymore?

I've never understood this argument. Do you ever work with other humans? They are very much not deterministic, yet they can often produce useful code that helps you achieve more than you could by yourself.

Quarrelsome

5d ago

i always thought my job was to be able to prove the correctness of the system but maybe the reality is that my job was actually just to sling shit at someone until they were satisfied.

tokioyoyo

5d ago

3 replies

Everyone keeps being angry at me when I mention that the way things are going, future development will just be based on "did something wrong while writing code? all good, throw everything out and rewrite, keep pulling the level of the slot machine and eventually it'll work". It's a fair tactic, and it might work if we make the coding agents cheap enough.

I'll add a personal anecdote - 2 years ago, I wrote a SwiftUI app by myself (bare you, I'm mostly an infrastructure/backend guy with some expertise in front end, where I get the general stuff, but never really made anything big out of it other than stuff on LAMPP back in 2000s) and it took me a few weeks to get it to do what I want to do, with bare minimum of features. As I was playtesting my app, I kept writing a wishlist of features for myself, and later when I put it on AppStore, people around the world would email me asking for some other features. But life, work and etc. would get into way, and I would have no time to actually do them, as some of the features would take me days/weeks.

Fast forward to 2 weeks ago, at this point I'm very familiar with Claude Code, how to steer multiple agents at a time, quick review its outputs, stitch things together in my head, and ask for right things. I've completed almost all of the features, rewrote the app, and it's already been submitted to AppStore. The code isn't perfect, but it's also not that bad. Honestly, it's probably better from what I would've written myself. It's an app that can be memory intensive in some parts, and it's been doing well from my testings. On top of it, since I've been steering 2-3 agents actively myself, I have the entire codebase in my mind. I also have overwhelming amount of more notes what I would do better and etc.

My point is, if you have enough expertise and experience, you'll be able to "stitch things together" cleaner than others with no expertise. This also means, user acquisition, marketing and data will be more valuable than the product itself, since it'll be easier to develop competing products. Finding users for your product will be the hard part. Which kinda sucks, if I'll be honest, but it is what it is.

5d ago

1 reply

The difficulty comes in managing the agent. Ensuring it knows the state of the codebase, conventions to follow, etc. Steering it.

I've had the same experience as you. I've applied it to old projects which I have some frame of reference for and it's like a 200x speed boost. Just absolutely insane - that sort of speed can overcome a lot of other shortcomings.

noduerme

5d ago

1 reply

I don't know if I'm understanding you correctly, but my experience reflects what (I think) you're saying. Given a fully formed older project and a clean set of feature requests, Claude can be a beast. On the other hand, steering it through a greenfield project feels more labor intensive than writing the code myself.

I'm a full stack dev, and solo, so I write data schema, backends and frontends at the same time, usually flipping between them to test parts of new features. As far as AI use, I'm really just at the level of using a single Claude agent in an IDE - and only occasionally, because it writes a lot of nonsense. So maybe I'm missing out on the benefits of multiple agents. But where I currently see value in it is in writing (1) boilerplate and (b) sugar - where it has full access to a large and stable codebase. Where I think it fails is in writing overarching logical structures, especially early on in a project. It isn't good at writing elegant code with a clear view of how data, back and front should work together. When I've tried to start projects from scratch with Claude, it feels like I'm fighting against its micro-view of each piece of code, where it's unable to gain a macro-view of how to orchestrate the whole system.

So like, maybe a bottomless wallet and a dozen agents would help with that, but there isn't so much room for errors or bugs in my work code as there is in my fun/play/casual game code. As a result I'm not really seeing that much value in it for paid work.

5d ago

I've found it to do quite well if you form a detailed design doc and you state all your implementation detail opinions up front. Architecture, major third party libraries, technologies, etc. But it can generate a lot of code very fast - it's hard to steer everything. There is certainly a tradeoff between speed and control. At one end, if you want to specify how every single line is written then yeah, it's going to be faster if you do it yourself. On the other hand, if you want to let it make more assumptions on implementation details, it can go extremely fast.

If your end goal is to produce some usable product, then the implementation details matter less. Does it work? Yes? OK then maybe dont wrestle with the agent over specific libraries or coding patterns.

CodingJeebus

5d ago

6 replies

> It's a fair tactic, and it might work if we make the coding agents cheap enough.

I don’t see how we get there, though, at least in the short term. We’re still living in the heavily-corporate-subsidized AI world with usage-based pricing shenanigans abound. Even if frontier models providers find a path to profitability (which is a big “if”), there’s no way the price is gonna go anywhere but up. It’s moviepass on steroids.

Consumer hardware capable of running open models that compete with frontier models is still a long ways away.

Plus, and maybe it’s just my personal cynicism showing, but when did tech ever reduce pricing while maintaining quality on a provided service in the long run? In an industry laser focused on profit, I just don’t see how something so many believe to be a revolutionary force in the market will be given away for less than it is today.

Billions are being invested with the expectation that it will fetch much more revenue than it’s generating today.

5d ago

1 reply

> We’re still living in the heavily-corporate-subsidized AI world

There's little evidence this is true. Even OpenAI who is spending more than anyone is only losing money because of the free version of ChatGPT. Anthropic says they will be profitable next year.

> Plus, and maybe it’s just my personal cynicism showing, but when did tech ever reduce pricing while maintaining quality on a provided service in the long run? In an industry laser focused on profit, I just don’t see how something so many believe to be a revolutionary force in the market will be given away for less than it is today.

Really?

I mean I guess I'm showing my age but the idea I can get a VM for a couple of dollars a month and expect it to be reliable make me love the world I live in. But I guess when I started working there was no cloud and to get root on a server meant investing thousands of dollars.

CodingJeebus

5d ago

1 reply

There's plenty of evidence that this is true.

According to Ed Zitron, Anthropic spent more than it's total revenue in the first 9 months of 2025 on AWS alone: $2.66 billion on AWS compute on an estimated $2.55 billion in revenue. That's just AWS, not payroll, not other software or hardware spend. He's regularly reporting concrete numbers that look horrible for the industry while hyperscalers and foundation model companies continue to make general statements while refusing to get specific or release real revenue figures. If you only listen to what the CEOs are saying, then sure it sounds great.

Anthropic also said that AI would be writing 95% of code in 3 months or something, however many months ago that was.

4d ago

> $2.66 billion on AWS compute on an estimated $2.55 billion in revenue.

Yes, but it's unclear how much of that is training costs vs operational costs. They are very different things.

viraptor

5d ago

1 reply

There are already providers that are comically cheap at the moment. Minimax/M2.1 for example is good-enough for many things and 15min long sessions still cost something like 2c. Another version or two and Claude is likely going to start feeling the pressure (or minimax will raise the prices - we'll see)

CodingJeebus

5d ago

1 reply

> There are already providers that are comically cheap at the moment.

But how many of those providers are too subsidizing their offering through investment capital? I don't know offhand of anyone in this space that is running at or close to breakeven.

It feels very much like the early days of streaming when you could watch everything with a single Netflix account. Those days are long gone and never coming back.

viraptor

5d ago

I don't think it matters in practice. Yesterday we could run GPT oss, today we can run m2.1, tomorrow we'll run something comparable to Opus4.5. The open models are getting so good that what we get at the end of this year will be "good enough" locally for years, even if everything else burns down.

5d ago

This seems too pessimistic? Moore's law is a thing. Price reductions with better quality have been going on for a long time in computing and networking.

We're also seeing significant price reductions every year for LLM's. Not for frontier models, but you can get the equivalent of last year's model for cheaper. Hard to tell from the outside, but I don't think it's all subsidized?

I think maybe people over-updated on Bitcoin mining. Most tech is not inherently expensive.

tokioyoyo

5d ago

> but when did tech ever reduce pricing while maintaining quality on a provided service in the long run

That's an old world that we experienced in 2000s, and maybe in early 2010s, where we cared about the quality on a provided service in the long run. For anything web-app-general-stuff related, that's long gone, as everyone (reads: mostly everyone) has very short attention span, and what is needed is "if the thing i desire can be done right now". In long run? Who cares. I keep seeing this in every day life, at work, discussions with my previous clients and etc.

Once again, I wish it wasn't true, but nothing is pointing that it's not true.

bigiain

5d ago

I wonder if this needs frontier models?

Or, if it does _now_, how long it'll be before it' will work well using downloadable models that'll run on, say, a new car's worth of Mac Studios with a bunch of RAM in them to allow a small fleet of 70B and 120B models (or larger) to run locally? Perhaps even specialised models for each of the roles this uses?

margalabargala

5d ago

Many of the billions being invested are for the power bill of training of new models. Not to mention the hardware needed to do so. Any hardware training a new model, isn't being used for inference.

If training of new models ceased, and hardware was just dedicated to inference, what would that do to prices and speed? It's not clear to me how much inference is actually being subsidized over the actual cost to run the hardware to do it. If there's good data on that I'd love to learn more though.

5d ago

1 reply

I think for beginners, it might be more like a roguelike? You go off in the wrong direction entirely and die, but you learn something and start again.

Since we have version control, you can restart anywhere if you think it's a good place to fork from. I like greenfield development, but I suspect that there are going to be a lot more forks from now on, much like the game modding scene.

tokioyoyo

5d ago

1 reply

The thing about beginners (and I'm sure we can all relate to them from our past) is they won't really know which path is good or bad. In roguelike, when you make a mistake, you kinda know why you made a mistake and how you got there. For beginners, even if you have version control, you never developed that "sense of what feels right". Or "there HAS TO BE a simpler way of doing it, i just have to ask" sense. I have no idea how to describe it, but I think you might get what I mean?

majormajor

5d ago

1 reply

Well, at some point you'd learn by maxing out your credit card on your cloud bill, or getting hacked and losing all your users' data, or...

Companies with money-making businesses are gonna find themselves in an interesting spot when the "vibe juniors" are the vast majority of the people they can find to hire. New ways will be needed to reduce the risk.

fabianholzer

5d ago

> Well, at some point you'd learn by maxing out your credit card on your cloud bill, or getting hacked and losing all your users' data, or...

...go to jail?

dwd

5d ago

1 reply

> I actually have six species of bamboo on my property.

I have enjoyed Steve's rants since "Execution in the Kingdom of Nouns" and the Google "Platform rant", but he may need someone to talk to him about bamboo and what a terrible life choice it is. Unless you can keep it the hell away from you and your neighbours it is bad, very bad. I'm talking about clumping varieties, the runners are a whole other level.

pjc50

5d ago

1 reply

It's perfectly on brand for an AI advocate to have a fast-growing invasive species that's going to externalize costs onto his neighbors and damage the local ecosystem.

neves

5d ago

Best comment in this thread. Bamboo is hell. Search the web about how to limit its spread. It's a war operation (trenches included)

xantronix

5d ago

1 reply

This is performance art and certain members of the audience are the punchline; I genuinely cannot conceive of any other meaning or purpose for this.

wwweston

5d ago

If there weren’t a repo I think it was satire.

There is a repo and I am not sure; the only way to resolve it probably is to spend some of that money he’s talking about.

Jdstanhope

5d ago

1 reply

Is there an example of something built with this other than itself?

neonnoodle

5d ago

That's the OLD way of thinking! The future is bigger and bigger vibe-coded machines for faster and faster vibe coding, oceans of unread code piped back into the intake valve, for the glorification of itself and its own inevitability. "Practical" "applications" are merely speedbumps in the way of our new Singularity Engines, shooting out million-line diffs that will not, and SHOULD NOT, be useful for anything. We will know when we have achieved success when we no longer even consider computer programming a tool for solving real-world problems.

bitwize

5d ago

1 reply

Upon reading the nth "Web development is fun again because LLMs make the complexity go away" article here on Hackernews, I started thinking, "LLM-based development is going to get even more complicated, fiddly, and tedious, and then we'll have to paper that complexity over with abstractions, which we'll then manage using LLMs in another turn of an infinite spiral of setting the earth progressively more on fire, just to stand up a basic web app which we could have done in 2000 with some HTML and PHP/Perl/Python knowledge". And here we are.

5d ago

Don't worry, we will paper over the AI fever dream with quantum computing.

ex-aws-dude

5d ago

2 replies

I’m somewhat out of the loop on the latest LLM stuff and I can’t tell if this is satire or not

an0malous

5d ago

1 reply

I’m glad it’s not just me. Reading the post felt like satire, came to the comments and everyone’s talking about how Gas Town is the future

techdmn

5d ago

I've been reading Steve for a long long time. He's had a lot of good ideas, issued some solid advice, but has always had a quirky sense of humor. A few pages into it I thought "this has to be a joke". But I couldn't find the punchline. This is the most depressing comment section I've read in a long time. I might have to enter the lottery to become an apprentice electrician.

5d ago

Same here, but then I found the github project

munchler

5d ago

1 reply

With dozens of agents running at a time, it must cost $1000's to build anything non-trivial. Is there a business model behind this project, or is he just made of money?

an0malous

5d ago

1 reply

The second one

5d ago

1 reply

This is not the same Steve Yegge that had an old "blog of rants", or is it?

andrewl-hn

5d ago

Same dude. After 20+ years at FAANG he probably has enough FU money to burn on AI for next few decades.

5d ago

9 replies

I don't know about you, but when the creator of a software says I have not read any of the code, I don't want to install or use it. Call me old fashioned. Really hoping this terrifying vibe coding future dies an early death before the incurred technical debt makes every digital interaction a landmine.

jeffrallen

5d ago

1 reply

[delayed]

5d ago

I have nothing against automated code completion on steroids or agents. What I cannot condone is not reading and understanding the generated code. If you have not understood your agent generated code, you will be "surprised" for sure, sooner or later.

MarkSweep

5d ago

1 reply

Agreed. If the author did not bother to write, much less read, their work, why should we spend time reading it?

In the past a large codebase indicated that maybe you might take the project serious, as some human effort was expended in its creation. There were still some outliers like Urbit and it's 144 KLOC of Hoon code, perverse loobeans and all.

Now if I get so much as a whiff of AI scent of a project, I lot all interest. It indicates that the author did not a modicum of their own time in the project, so therefore I should waste my own time on it.

(I use LLM-based coding tools in some of my projects, but I have the self-respect to review the generated code before publishing init.)

SequoiaHope

5d ago

I’ve come to appreciate that there is a new totally valid (imo) kind of software development one can do now where you simply do not read the code at all. I do this when prototyping things with vibe coding for example for personal use, and I’ve posted at least one such project on GitHub for others who may want to run the code.

Of course as a developer you still have to take responsibility for your code, minimally including a disclaimer, and not dumping this code in to someone else’s code base. For example at work when submitting MRs I do generally read the code and keep MRs concise.

I’ve found that there is a certain kind of coder that hears of someone not reading the code and this sounds like some kind of moral violation to them. It’s not. It’s some weird new kind of coding where I’m more creating a detailed description of the functionality I want and incrementally refining it and iterating on it by describing in text how I want it to change. For example I use it to write GUI programs for Ubuntu using GTK and python. I’m not familiar with python-gtk library syntax or GTK GUI methods so there’s not really much of a point in reading the code - I ask the machine to write that precisely because I’m unfamiliar with it. When I need to verify things I have to come up with ways for the machine to test the code on its own.

Point is I think it’s honestly one new legitimate way of using these tools, with a lot of caveats around how such generated code can be responsibly used. If someone vibe coded something and didn’t read it and I’m worried it contains something dangerous, I can ask Claude to analyze it and then run it in a docker container. I treat the code the same way the author does - as a slightly unknown pile of functions which seem to perform a function but may need further verification.

I’m not sure what this means for the software world. On the face of it it seems like it’s probably some kind of problem, but I think at the same time we will find durable use cases for this new mode of interacting with code. Much the same as when compilers abstracted away the assembly code.

mark242

5d ago

1 reply

Many years ago, java compilers, though billed out as a multiple-platform write-once-run-anywhere solution, those compilers would output different bytecode that would behave in interesting and sometimes unpredictable fashion. You would be inside jdb, trying to debug why the compiler did what it did.

This is not exactly that, but it is one step up. Having agents output code that then gets compiled/interpreted/whatever, based upon contextual instruction, feels very, very familiar to engineers who have ever worked close to the metal.

"Old fashioned", in this aspect, would be putting guardrails in place so that you knew that what the agent/compiler was creating was what you wanted. Many years ago, that was binaries or bytecode packaged with lots of symbols for debugging. Today, that's more automated testing.

https://github.com/shepherdjerred/scout-for-lol/blob/main/es...

5d ago

1 reply

You are ignoring the obvious difference between errors introduced while translating one near-formal-intent-clear language to another as opposed to ambiguous-natural-language to code done through a non-deterministic intermediary. At some point in the future the non-deterministic intermediary will become stable enough (when temperature is low and model versions won't affect output much) but the ambiguity of the prompting language is still going to remain an issue. Hence, read before commit will always be a requirement I think. A good friend of mine wrote somewhere that at about 5 agents or so per project is when he is the bottleneck. I respect that assessment. Trust but verify. This way of getting faster output by removing that bottleneck altogether is, at least for me, not a good path forward.

reedlaw

5d ago

Unfortunately, reading before merge commit is not always a firm part of human team work. Neither reading code nor test coverage by themselves are sufficient to ensure quality.

jumploops

5d ago

2 replies

To be fair, the author says: "Do not use Gas Town."

I started "fully vibecoding" 6 months ago, on a side-project, just to see if it was possible.

It was painful. The models kept breaking existing functionality, overcomplicating things, and generally just making spaghetti ("You're absolutely right! There are 4 helpers across 3 files that have overlapping logic").

A combination of adjusting my process (read: context management) and the models getting better, has led me to prefer "fully vibecoding" for all new side-projects.

Note: I still read the code that gets merged for my "real" work, but it's no longer difficult for me to imagine a future where that's not the case.

shepherdjerred

5d ago

1 reply

Static analysis helps a lot. For example, I use jscpd [0] to solve the problem of AI duplicating code.

[0] https://github.com/kucherenko/jscpd

rlt

4d ago

1 reply

What's the process for getting an agent like Claude Code to use jscpd effectively? I assume it's just another tool call with some basic prompting?

shepherdjerred

4d ago

I have it as a pre commit hook and also runs in CI. I also wrote an eslint plugin

ymck

5d ago

2 replies

I have noticed in just the past two weeks or so, a lot of the naysayers have changed their tunes. I expect over the next 2 months there will be another sea change as the network effect and new frameworks kick in.

csomar

5d ago

1 reply

No. If anything we are getting "new" models but hardly any improvements. Things are "improving" on scores, ranking and whatever other metrics the AI industry has invented but nothing is really materializing in real work.

rlt

4d ago

I've seen a number of previous skeptics change their tune with Opus 4.5.

notarobot123

5d ago

I think we have crossed the chasm and the pragmatists have adopted these tools because they are actually useful now. They've thrown out a lot of their previously held principles and norms to do so and I doubt the more conservative crowd will be so quick to compromise.

2 years sounds more likely than 2 months since the established norms and practices need to mature a lot more than this to be worthy of the serious consideration of the considerably serious.

samiv

5d ago

1 reply

Same here. I'm "happy" that I'm old "enough" to be able to wrap up my career in a few years time and likely be able to get out of this mess before this "agentic AI slop" becomes the expected workflow.

On my personal project I do sometimes chat with ChatGPT and it works as a rubber duck. I explain, put my thoughts into words and typically I already solve it when I'm thinking it through while expressing it in words. But I must also admit that ChatGPT is very good at producing prose and I often use it for recommending names of modules, functions, enums etc. So there's some value there.

But when it comes to code I want to understand everything that goes into my project. So in the end of the day I'm always going to be the "bottle neck", whether I think through the problem myself and write the code or I review and try to understand the AI generated code slop.

It seems to me that using the AI slop generation workflow is a great fit for the industry though, more quantity rather quality and continuous churn. Make it cheaper to replace code so that the replacement can be replaced a week later with another vibe-coded slop. Quality might drop, bugs might proliferate but who cares?