We Put a Coding Agent in a While Loop
Original: We put a coding agent in a while loop
Key topics
Regulars are buzzing about a coding agent trapped in a while loop, sparking both fascination and unease as commenters riff on the project's implications. Some, like ghuntley, are shaken by the agent's capabilities, dubbed "Ralph," while others joke about its quirks, like terminating its own process. The discussion reveals a mix of awe and trepidation as participants ponder the potential of such agents and the simplicity of the code that powers them. As the conversation unfolds, it becomes clear that this experiment is tapping into deeper questions about AI's potential and our readiness for it.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
7m
Peak period
64
12-24h
Avg / period
17.8
Based on 160 loaded comments
Key moments
- 01Story posted
Aug 24, 2025 at 12:18 PM EDT
5 months ago
Step 01 - 02First comment
Aug 24, 2025 at 12:25 PM EDT
7m after posting
Step 02 - 03Peak activity
64 comments in 12-24h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 29, 2025 at 7:29 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
The language is called Cursed.
We were curious to see if we can do away with IMPLEMENTATION_PLAN.md for this kind of task
If we actually want stuff that works, we need to come up with a new process. If we get "almost" good code from a single invocation, you just going to get a lot of almost good code from a loop. What we likely need is a Cucumberesque format with example tables for requirements that we can distill an AI to use. It will build the tests and then build the code to to pass the tests.
Like back in the day being brought in to “just fix” a amalgam of FoxPro-, Excel-, and Access-based ERP that “mostly works” and only “occasionally corrupts all our data” that ambitious sales people put together over last 5 years.
But worse - because “ambitious sales people” will no longer be constrained by sandboxes of Excel or Access - they will ship multi-cloud edge-deployed kubernetes micro-services wired with Kafka, and it will be harder to find someone to talk to understand what they were trying to do at the time.
I don’t recall the last time Claude suggested anything about version control :-)
And how many know they need to ask for version control?
does it? Did you forget the prompts? MCP is just a protocol for tool/function calling which in turn is part of the prompt, quite an important part actually.
Did you think AI works by prompts like "make magic happen" and it... just happens? Anyone who makes dumb arguments like this should not deserve a job in tech.
And don't even get me start with giving AI your entire system in one tool, it's good for toying around only.
That's only really relevant I'd you're leaving it unattended though.
Plus i'm not convinced that generating "kubectl"...json..."get"...json..."pod"... is easier for most models than "bash"...json..."kubectl get pod"...
Not really the same since Claude didn’t deploy anything — but I WAS surprised at how well it tracked down the ingress issue to a cron job accidentally labeled as a web pod (and attempting to service http requests).
It actually prompted me to patch the cron itself but I don’t think I’m that bullish yet to let CC patch my cluster.
I have seen one Kafka instal that was really the best tool for the job.
More than a hand full of them could have been replaced by Redis, and in the worst cases could have been a table in Postgres.
If Claude thinks it fine, remember it's only a reflection of the dumb shit it finds in its training data.
Regardless this just made me shudder thinking about the weird little ocean of (now maybe dwindling) random underpaid contract jobs for a few hours a month maintaining ancient Wordpress sites...
Surely that can't be our fate...
Not at that speed. Scale remains to be seen, so far I'm aware only of hobby-project wreck anecdotes.
New? New!?
This is my job now!
I call it software archeology — digging through Windows Server 2012 R2 IIS configuration files with a “last modified date” about a decade ago serving money-handling web apps to the public.
It’s “fun” in the sense of piecing together history from subtle clues such as file owners, files on desktops of other admins’ profiles, etc…
I feel like this is what it must be like to open a pharaoh’s tomb. You get to step into someone else’s life from long ago, walk in their shoes for a bit, see the world through their eyes.
“What horrors did you witness brother sysadmin that made you abandon this place with uneaten takeaway lunch still on your desk next to the desiccated powder that once was a half drunk Red Bull?”
This will be the big counter to AI generated tools; at one point they become black boxes and the only thing people can do is to try and fix them or replace them altogether.
Of course, in theory, AI tooling will only improve; today's vibe coded software that in some cases generate revenue can be fed into the models of the future and improved upon. In theory.
Personally, I hate it; I don't like magic or black boxes.
Problem is, that in everyones' experience, this almost never happens. The prototype is declared "good enough, just needs a few small adjustments", rewrite is declared too expensive, too time-consuming. And crap goes to production.
AI is emerging as a possible solution to this decades old problem.
It's better than houses, IMO - no one moves into the bedroom once it's finished while waiting for the kitchen.
We were deploying new changes every 2 weeks and it was too fast. End users need training and communication, pushback was quite a thing.
We also just pushed back aggressive timeline we had for migration to new tech. Much faster interface with shorter paths - but users went all pitchforks and torches just because it was new.
But with AI fortunately we will get rid of those pesky users right?
Well maybe they were happy but software needs to be updated to new business processes their company was rolling out.
Managers wanted the changes ASAP - their employees not so much, but they had to learn that hard way.
Not so fun part was that we got the blame. Just like I got down vote :), not my first rodeo.
How much is it a problem, really ?
I mean, what are the alternatives ?
How much of a problem it is can be seen with tons of products that are crap on release and only slowly get patched to a half-working state when the complaints start pouring in. But of course, this is status quo in software, so the perception of this as a problem among software people isn't universal I guess.
How about the tons of products we don't even see? Those that tried to do it right on the first try, then never delivered anything because there were too slow and expensive. Or those that delivered something useless because they did not understand the users' need.
If "complaints start pouring in", that means the product is used. This in turns can mean two things: 1/ the product is actually useful despite its flaws, or 2/ the users have no choice, which is sad.
I would welcome seeing a lesser amount of new crappy products.
That dynamic leads to a spiral of ever crappier software: You need to be first, and quicker than your competitors. If you are first, you do have a huge advantage, because there are no other products and there is no alternative to your crapware. Coming out with a superior product second or third sometimes works, but very often doesn't, you'll be an also-ran with 0.5% market share, if you survive at all. So everyone always tries to be as crappy and as quick as possible, quality be damned. You can always fix it later, or so they say.
But this view excludes the users and the general public: Crapware is usually full of security problems, data leaks, harmful bugs that endanger peoples' data, safety, security and livelihood. Even if the product is actually useful, at first, in the long term the harm might outweigh the good. And overall, by the aforementioned spiral, every product that wins this way damages all other software products by being a bad example.
Therefore I think that software quality needs some standards that programmers should uphold, that legislators should regulate and that auditors should thoroughly check. Of course that isn't a simple proposition...
Not saying this happens always, but that's what people want to avoid when they say they are okay with a quick hack if it works.
I think we'll need to see some major f-ups before this current wave matures.
But almost no-one really works like that, and those three separate steps are often done ad-hoc, by the same person, right when the fingers hit the keys.
So we went full circle, again.
Just having requirements and a specification isn't necessarily waterfall. Almost all agile processes at least have requirements, the more formal ones also do have specifications. You just do it more than once in a project, like once per sprint, story or whatever.
Now that agile practitioners have learned that requirements and upfront design actually is helpful, the only difference seems to be that the loops are tighter. That might not have been possible earlier without proper version control, without automated tests, and the software being delivered on solid media. A tight feedback loop is harder when someone has to travel to your customer and sit down at their machines to do any updates.
The promise of coding AI is that it can maybe automate that last step so more intelligent humans can actually have time to focus on the more important first parts.
My feeling is that software developers will need end up working this type of technical consultant role once LLM dominance has been universally accepted.
So, no compilers for you neither ?
(To be fair: I'm not loving the whole vibe coding thing. But I'm trying to approach this wave with open mind, and looking for the good arguments in both side. This is not one of them)
Actual randomness is used in FPGA and ASIC compilers which use simulated annealing for layout. Sometimes the tools let you set the seed.
The 'black-boxes' are the theoretical systems non-technical users are building via 'vibe-coding'. When your LLM says we need to spin up an EC2 instance, users will spin one up. Is it configured? Why is it configured that way? Do you really need a VPS instead of a Pi? These are questions the users, who are building these systems, won't have answers to.
When people do interpretabililty work on some NN, they often learn something. What is it that they learn, if not something about how the works?
Of course, we(meaning, humanity) understand the architecture of the NNs we make, and we understand the training methods.
Similarly, if we have the output of an indistinguishability obfuscation method applied to a program, we understand what the individual logic gates do, and we understand that the obfuscated program was a result of applying an indistinguishability obfuscation method to some other program (analogous to understanding the training methods).
So, like, yeah, there are definitely senses in which we understand some of "how it works", and some of "what it does", but I wouldn't say of the obfuscated program "We understand how it works and what it does.".
(It is apparently unknown whether there are any secure indistinguishability obfuscation methods, so maybe you believe that there are none, and in that case maybe you could argue that the hypothetical is impossible, and therefore the argument is unconvincing? I don't think that would make sense though, because I think the argument still makes sense as a counterfactual even if there are no cryprographically secure indistinguishability obfuscation methods. [EDIT: Apparently it has in the last ~5 years been shown, under relatively standard cryptographic assumptions, that there are indistinguishability obfuscation methods after all.])
Any worthwhile AI is non-linear, and it’s output is not able to be predicted (if it was, we’d just use the predictor).
Before AI companies were usually very reticent to do a rewrite or major refactoring of software because of the cost but that calculus may change with AI. A lot of physical products have ended up in this space where it's cheaper to buy a new product and throw out the old broken one rather than try and fix it. If AI lowers the cost of creating software then I'm not sure why it wouldn't go down the same path as physical goods.
There are still so many businesses running on pen and paper or excel spreadsheets or off the shelf software that doesn't do what they need.
Hard to say what the future holds but I'm beginning to see the happy path get closer than it looked a year or two ago.
Of course, on an individual basis it will be possible to end up in a spot where your hard earned skills are no longer in demand in your physical location, but that was always a possibility.
When I hit your comment:
1. I thought, "YES! Indeed!"
2. Then, "For Sale: Baby Shoes."
3. The similar feel caused me to do a rethink on all this. We are moving REALLY fast!
Nice comment
The hook aspect of these appears similarly suggestive and brief and I thought that intriguing and thought provoking given the overall subject matter.
And that just gave me some reference to the speed this whole tech branch has.
Unless a business allows any old employee to spin up cloud services on a whim we’re not going to see sales people spinning up containers and pipelines, AI or not.
>I'm creating an app for dog walkers to optimize their routes. It should take all client locations and then look for dog-friendly cafes for the walker to get lunch and then find the best route. I'm vibe coding this on GCP. Please generate a Terraform file to allocate the necessary resources.
And then over time these Excel spreadsheets become a core system that runs stuff.
I used to live in fear of one of these business analyst folks overwriting a cell or sorting by just the column and not doing the rows at the same time.
Also VLOOKUP's are the devil.
IMHO, there's a strong case for the opposite. My vibe coding prompts are along the lines of "Please implement the plan described in `phase1-epic.md` using `specification.prd` as a guide." The specification and epics are version controlled and a part of the project. My vibe coded software has better design documentation than most software projects I've been involved in.
But we also didn't have an AI tool to do the modifying of that bad code. We just had our own limited-capacity-brain, mistake-making, relatively slow-typing selves to depend on.
Declarative languages and AI go hand in hand. SQL was intended to be a ‘natural’ language that the query engine (an old-school AI) would use to write code.
Writing natural language prompts to produce code is not that different, but we’re using “stochastic” AI, and stochastic means random, which means mistakes and other non-ideal outputs.
I watched him text people and say "set up a lovable account, put in your credit card info then send me the login". Then he would just write some prompts for them on lovable to build their websites for them. Then text them back on discord and be like "done".
He said he had multiple tiers, like 50$/month got you in the discord and he would reply your questions and whatever. but for 500$/month he would do everything you want and just chat with you about what you wanted for your incredible facebook replacement app for whatever. But I mean most of the stuff seemed like it was just some small business trying to figure out a way to use the internet in 2025.
All this gave me anxiety because I'm here as an academic scientist NOT making 50$/month*1000 signups to vibe code for people who can't vibe code when I definitely know how to vibe code at least. Haha. Maybe I should listen to all my startup friends and go work at a startup instead.
I hope you can meet him on a plane too.
Former web dev and I still do some SEO and for the most part, he's correct. I've posted on here multiple times over the last two to three years how easy it is now to manipulate search engines now.
Back in the day, when you needed content for SEO and needed it to be optimized, you had to find a content writer who knew how to do this, or write it yourself and hope that Google doesn't bury your site for stuffing your content with keywords.
Now? Any LLM can spin out optimized content in a few seconds. Any LLM can review your site, compare it to a competitor and tell you want you should do to rank better. All of the stuff SEO people used to do? You can do now in the span of a few mins with any LLM. This is lower hanging fruit than vibe coding and Google has yet to adjust their algorithm to deal with this.
A few years ago, I cranked out an entire services area page for a client. I had AI write all the content. Granted, it was pretty clunky and I had to clean some of it up, but it saved me hours of trying to write it myself. We're talking some 20-30 pages that I gradually posted over the course of several months. Within a days, every new page was ranking page 1 within the top ten results.
These are my favorite types of code bases to work on. The source of truth is the code. You have to read it and debug it to figure it out, and reconcile the actual behaviors with the desired or expected behaviors through your own product oriented thinking
[0] https://x.com/PovilasKorop/status/1959590015018652141
Im really curious about what other jobs will pop up. As long as there is an element of probability associated with AI, there will need to be manual supervision for certain tasks/jobs.
Ok, now that is funny! On so many levels.
Now, for the project itself, a few thoughts:
- this was tried before, about 1.5 years ago there was a project setup to spam github with lots of "paper implementations", but it was based on gpt3.5 or 4 or something, and almost nothing worked. Their results are much better.
- surprised it worked as well as it did with simple prompts. "Probably we're overcomplicating stuff". Yeah, probably.
- weird copyright / IP questions all around. This will be a minefield.
- Lots of SaaS products are screwed. Not from this, but from this + 10 engineers in every midsized company. NIH is now justified.
Yeah, we're in weird territory because you can drive an LLM as a Bitcoin mixer over intellectual property. That's the entire point/meaning behind https://ghuntley.com/z80.
You can take something that exists, distill it back to specs, and then you've got your own IP. Throw away the tainted IP, and then just run Ralph over a loop. You are able to clone things (not 100%, but it's better than hiring humans).
AI output isn't copyrighted in the US.
Basically to avoid the ambiguity of training LLM from unlicensed code, I use it to generate description of the code to another LLM trained from permissively licensed code. (There aren't any usable public domain models I've found)
I use it in real world and it seems that the codegen model work 10-20% of the time (the description is not detailed enough - which is good for "clean room" but a base model couldn't follow that). All models can review the code, retry and write its own implementation based on the codegen result though.
except you dont
Is Unix “small sharp tools” going away? Is that a relic of having to write everything in x86 and we’re now just finally hitting the end of the arc?
I have long held that high software salaries withhold the power of boutique software from its potential applications in small businesses.
It's possible we're about to see what unleashing software in small businesses might have looked like, to some degree, just with much less expert guidance and wisdom.
I am a developer so my point of view on salaries is not out of bitterness.
Now I do a calculus with dependencies. Do I want to track the upstream, is the rigging around the core I want valuable, is it well maintained? If not, just port and move on.
Exactly the point behind this post https://ghuntley.com/libraries/
I would say, it is better maintain your own AI improved forks of the libraries and I am hoping that pattern will be more common and will also benefit upstream libraries as well.
Is that... the first recorded instance of an AI committing suicide?
One of the providers (I think it was Anthropic) added some kind of token (or MCP tool?) for the AI to bail on the whole conversation as a safety measure. And it uses it to their liking, so clearly not trying to self preserve.
Pretty sure even that is still over-anthropomorphising. The LLM just generates tokens, doesn't matter whether the next token is "strawberry" or "\STOP".
Even talking about "goals" is a bit ehhh, it's the machine's "goal" to generate tokens the same way it's the Sun's "goal" to shine.
Then again, if we're deconstructing it that far, I'd "de-anthropomorphise" humans in much the same way, so...
https://www.apolloresearch.ai/research/scheming-reasoning-ev...
https://www.youtube.com/watch?app=desktop&t=10&v=xOCurBYI_gY
(Background: Someone training an algorithm to win NES games based on memory state)
Did it just solve The Halting Problem? ;)
"This business will get out of control. It will get out of control and we'll be lucky to live through it."
https://www.youtube.com/watch?v=YZuMe5RvxPQ&t=22s
148 more comments available on Hacker News