Who Needs Git When You Have 1m Context Windows?

3 months ago

6 replies

A fun anecdote, and I assume it's tongue in cheek, although you never know these days, but is the LLM guaranteed to give you back an uncorrupted version of the file? A lossy version control system seems to me to be only marginally better than having no VCS at all.

shallmn

3 months ago

1 reply

I dont know where Gemini stores the context, but if I’m using a local LLM client app, that context is on my machine verbatim.

3 months ago

1 reply

If you ask the LLM to give you back that context does it give back to you verbatim?

cmsj

3 months ago

1 reply

statistically, maybe.

3 months ago

Stochastically correct is the best sort of correct?

flerchin

3 months ago

1 reply

Well, it'll give you what the tokenizer generated. This is often close enough for working software, but not exact. I notice it when asking claude for the line number of the code with specific implementation. It'll often be off by a few because of the way it tokenizes white space.

onionisafruit

3 months ago

Thanks. I noticed the same thing about line numbers but I didn’t know the reason. It has made me double check I’m in the right file more than once.

mathieuh

3 months ago

1 reply

I frequently (basically every conversation) have issues with Claude getting confused about which version of the file it should be building on. Usually what causes it is asking it do something, then manually editing the file to remove or change something myself and giving it back, telling it it should build on top of what I just gave it. It usually takes three or four tries before it will actually use what I just gave it, and from then on it keeps randomly trying to reintroduce what I deleted.

adastra22

3 months ago

1 reply

Your changes aren’t being introduced to its context, that’s why.

IanCal

3 months ago

The models definitely can get confused if they have multiple copies in their history though, regardless of whether your latest changes are in.

wongarsu

3 months ago

When I used toolcalls with uuids in the name, tiny models like quantized qwen3-0.6B would occasionally get some digits in the UUID wrong. Rarely, but often enough to notice even without automation. Larger models are much better, but give them enough text and they also make mistakes transcribing it

adastra22

3 months ago

From experience, no. I’ve customized my agent instructions to explicitly forbid operations that involve one-shot rewriting code for exactly this reason. It will typically make subtle changes, some of which have had introduced logic errors or regressions.

iLoveOncall

3 months ago

I'd say it's more likely guaranteed to give you back a corrupted version of the file.

I assume OP was lucky because the initial file seems like it was at the very start of the context window, but if it had been at the end it would have returned a completely hallucinated mess.

dotancohen

3 months ago

7 replies

Don't take this as career advice!

This is an amusing anecdote. But the only lesson to be learned is to commit early, commit often.

f1shy

3 months ago

1 reply

And in general maybe have a better working methodic, or however you name it. Sounds like messing around to me.

onionisafruit

3 months ago

1 reply

I like to mess around. Some of my best work comes out of messing around. The trick is making sure you mess around in a way that lets you easily hold onto whatever improvements you make. For me that means committing obsessively.

3 months ago

1 reply

And branches are free.

dotancohen

3 months ago

In git. It took me a while to get out of the SVN mindset when branches were expensive and slow to create.

cranium

3 months ago

5 replies

Commit even as a WIP before cleaning up! I don't really like polluting the commit history like that but with some interactive rebase it can be as if the WIP version never existed.

(Side ask to people using Jujutsu: isn't it a use case where jujutsu shines?)

3 months ago

3 replies

Assuming you squash when you merge the PR (and if you don't, why not?), why even care? Do people actually look through the commit history to review a PR? When I review I'm just looking at the finished product.

3 months ago

3 replies

Indiscriminate squashing sucks. Atomic commits are great if you want the git history to actually represent a logical changelog for a project, as opposed to a pointless literal keylog of what changes each developer made and when. It will help you if you need to bisect a regression later. It sucks if you bisect and find the change happened in some enormous incohesive commit. Squashing should be done carefully to reform WIP and fix type commits into proper commits that are ready for sharing.

3 months ago

3 replies

> It sucks if you bisect and find the change happened in some enormous incohesive commit.

But why are any PRs like this? Each PR should represent an atomic action against the codebase - implementing feature 1234, fixing bug 4567. The project's changelog should only be updated at the end of each PR. The fact that I went down the wrong path three times doesn't need to be documented.

3 months ago

2 replies

> Each PR should represent an atomic action against the codebase

We can bikeshed about this for days. Not every feature can be made in an atomic way.

withinboredom

3 months ago

1 reply

This simply isn’t true unless you have to put everything in one commit?

To be honest, I usually get this with people who have never realized that you can merge dead code (code that is never called). You can basically merge an entire feature this way, with the last PR “turning it on” or adding a feature flag — optionally removing the old code at this point as well.

3 months ago

2 replies

So maintaining old and new code for X amounts of time? That sounds acceptable in some limited cases, and terrible in many others. If the code is being changed for another reason, or the new feature needs to update code used in many places, etc. It can be much more practical to just have a long-lived branch, merge changes from upstream yourself, and merge when it's ready.

My industry is also fairly strictly regulated and we plainly cannot do that even if we wanted to, but that's admittedly a niche case.

withinboredom

3 months ago

1 reply

> So maintaining old and new code for X amounts of time?

No more than normal? Generally speaking, the author working on the feature is the only one who’s working on the new code, right? The whole team can see it, but generally isn’t using it.

> If the code is being changed for another reason, or the new feature needs to update code used in many places, etc. It can be much more practical to just have a long-lived branch, merge changes from upstream yourself, and merge when it's ready.

If you have people good at what they do ... maybe. I’ve seen this end very badly due to merge artefacts, so I wouldn’t recommend doing any merges, but rebasing instead. In any case, you can always copy a function to another function: do_something_v2(). Then after you remove the v1, remove the v2 prefix. It isn’t rocket science.

> My industry is also fairly strictly regulated and we plainly cannot do that even if we wanted to, but that's admittedly a niche case.

I can’t think of any regulations in any country (and I know of a lot of them) that dictate how you do code changes. The only thing I can think of is your own company’s policies in relation to those regulations; in which case, you can change your own policies.

https://blog.johner-institute.com/regulatory-affairs/design-...

3 months ago

Medical industry, code that gets shipped has to be documented, even if it's not used. It doesn't mean we can't ship unused code, it just means it's generally a pretty bad idea to do it. Maybe the feature's requirement might change during implementation, or you wanted to do a minor version release but that dead code is for a feature that needs to go into a major version (because of regulations).

> I can’t think of any regulations in any country (and I know of a lot of them) that dictate how you do code changes

closeparen

3 months ago

Our regulatory compliance regime hates it when we run non-main branches in production and specifically requires us to use feature flagging in order to delay rollouts of new code paths to higher-risk markets. YMMV.

3 months ago

1 reply

That's true, some are big and messy, or the change has to be created across a couple of PRs, but I don't think that the answer to "some PRs are messy" is "let's include all the mess". I don't think the job is made easier by having to dig through a half dozen messy commits to find where the bug is as opposed to one or two large ones.

3 months ago

2 replies

> I don't think that the answer to "some PRs are messy" is "let's include all the mess"

Hey look at us, two alike thinking people! I never said "let's include all the mess".

Looking at the other extreme someone in this thread said they didn't want other people to see the 3 attempts it took to get it right. Sure if it's just a mess (or, since this is 2025, ai slop) squash it away. But in some situations you want to keep a history of the failed attemps. Maybe one of them was actually the better solution but you were just short of making it work, or maybe someone in the future will be able to see that method X didn't work and won't have to find out himself.

3 months ago

1 reply

I can see the intent, but how often do people look through commit history to learn anything beside "when did this break and why"? If you want lessons learned put it in a wiki or a special branch.

Main should be a clear, concise log of changes. It's already hard enough to parse code and it's made even harder by then parsing versions throughout the code's history, we should try to minimize the cognitive load required to track the number of times something is added and then immediately removed because there's going to be enough of that already in the finished merges.

voidnap

3 months ago

> If you want lessons learned put it in a wiki or a special branch.

You already have the information in a commit. Moving that to another database like a wiki or markdown file is work and it is lossy. If you create branches to archive history you end up with branches that stick around indefinitely which I think most would feel is worse.

> Main should be a clear, concise log of changes.

No, that's what a changelog is for.

You can already view a range of commits as one diff in git. You don't need to squash them in the history to do that.

I am beginning to think that the people who advocate for squashing everything have `git commit` bound to ctrl+s and smash that every couple minutes with an auto-generated commit message. The characterization that commits are necessarily messy and need to be squashed as to "minimize the cognitive load" is just not my experience.

Nobody who advocates for squashing even talks about how they reason about squashing the commit messages. Like it doesn't come into their calculation. Why is that? My guess is, they don't write commit messages. And that's a big reason why they think that commits have high "cognitive load".

Some of my commit messages are longer than the code diffs. Other times, the code diffs are substantial and there are is a paragraph or three explaining it in the commit message.

Having to squash commits with paragraphs of commit messages always loses resolution and specificity. It removes context and creates more work for me to try to figure out how to squash it in a way where the messages can be understood with the context removed by the squash. I don't know why you would do that to yourself?

If you have a totally different workflow where your commits are not deliberate, then maybe squashing every merge as a matter of policy makes sense there. But don't advocate that as a general rule for everyone.

tekkk

3 months ago

I can see your point and sometimes I myself include PoC code as commented out block that I clean up in a next PR incase it proves to be useful.

But the fact is your complete PR commit history gives most people a headache unless it's multiple important fixes in one PR for conveniency's sake. Happens at least for me very rarely. Important things should be documented in say a separate markdown file.

DiggyJohnson

3 months ago

1 reply

yes, that would be ideal. especially in a world with infrastructure tied so closely to the application this standard cannot always be met for many teams.

3 months ago

Yeah "should" is often not reality, BUT I'm arguing that not squashing doesn't make things better.

1718627440

3 months ago

> Each [X] should represent an atomic action against the codebase

That's called a commit. Not sure why some insist on replacing commits with vendor lock-in with less tooling and calling it progress.

watwut

3 months ago

1 reply

Alternative to squashing is not a beautiful atomic commits. It is series of commits where commit #5 fixes commit #2 and intruduces bug to be fixed on commit #7. Where commit #3 introduces new class that is going to be removed in commits #6 and #7.

3 months ago

Yeah, I don't see the value in looking through that. At best I'll solve the problem, commit because the code works now, create unit tests, commit them, and then refactor one or both in another commit. That first commit is just ugly and that second holds no additional information that the end product won't have.

janzer

3 months ago

I so miss bazaar's UI around merges/commits/branches. I feel like most of the push for squashing is a result of people trying to work around git's poor UI here.

regularjack

3 months ago

2 replies

It is often easier to review commit-by-commit, provided of course that the developer made atomic commits that make sense on their own.

0xffff2

3 months ago

1 reply

I have literally never met a developer who does this (including myself). 99% of all PRs I have ever created or reviewed consist of a single commit that "does the thing" and N commits that fix issues with/debug failure modes of the initial commit.

3 months ago

1 reply

Yeah, make it work. Commit. Build unit test. Commit. Fix bugs. Commit. Make pretty. Commit and raise a PR.

nevertoolate

3 months ago

You never design a solution which needs multiple architectural components which _support_ the feature? I do, and would make little sense to merge them as separate PRs most of the time as that would mean sometimes tests written on the inappropriate level, also a lot more coordination and needs a lot more elaborate description then just explain how the set of components work in tandem to provide the user value.

3 months ago

I feel like that requires a lot of coordination that I, in the midst of development, don't necessarily have. Taking my WIP and trying to build a story around it at each step requires a lot of additional effort, but I can see how that would be useful for the reviewer.

We can agree that we don't need those additional steps once the PR is merged, though, right?

RandallBrown

3 months ago

1 reply

I don't typically review commit by commit, but I do really appreciate having good commit messages when I look at the blame later on while reading code.

3 months ago

1 reply

Which is just the PR message if you squash. To be clear, I'm not advocating for bad messages, but I am saying I don't worry about each commit and focus instead on the quality and presentation of the PR.

RandallBrown

3 months ago

Definitely, but if you've made your PR with several different commits the message will contain all of the information for the whole PR, instead of just the message that pertains to the changes in that commit. It's not a HUGE deal, but it can make it harder to understand what the commit message is saying.

3 months ago

2 replies

Git is a distributed version control system. You can do whatever you like locally and it won't "pollute" anything. Just don't merge that stuff into shared branches.

I automatically commit every time my editor (emacs) saves a file and I've been doing this for years (magit-wip). Nobody should be afraid of doing this!

throwaway-18

3 months ago

2 replies

Honest question - What DO you merge into shared branches? And, when your local needs to "catch up", don't you have to pull in those shared commits which conflict with your magit-wip commits because they touch the same code, but are different commit hashes?

3 months ago

The magit-wip commits go on a separate branch and ideally I'm never even aware of them. They just disappear eventually. They exist purely in case of a disaster à la the article.

I make "real" commits as I go and use a combination of `git commit --amend` and fixup commits (via git-autofixup) and `rebase --autosquash`. I periodically (daily, at least) fetch upstream and rebase on to my target branch. I find if you keep on top of things you won't end up with some enormous conflict that you can't remember how to resolve.

Crespyl

3 months ago

Feature branches that have been cleaned up and peer-reviewed/CI-tested, at least in the last few places I worked.

Every so often this still means that devs working on a feature will need to rebase back on the latest version of the shared branch, but if your code is reasonably modular and your project management doesn't have people overlapping too much this shouldn't be terribly painful.

thanksgiving

3 months ago

Exactly this. I can make a hundred commits that are one file per commit and I can later go back and

    git reset --soft HEAD~100

and that will cleanly leave it as the hundred commits never happened.

bayindirh

3 months ago

2 replies

I always commit when wrapping up the day. I add [WIP] in the subject, and add "NOTE: This commit doesn't build" if it's in a very half-baked state.

elSidCampeador

3 months ago

same - my eod commits are always titled 'checkpoint commit: <whatever>' and push to remote. Then before the MR is made (or turned from draft to final) I squash the checkpoint commits - gives me a psychological feeling of safety more than anything else imo

3 months ago

I do a bunch of context switching, and I commit every time I switch as stashing would be miserable. I never expect those WIP commits to reviewed and it'd be madness to try.

hiccuphippo

3 months ago

1 reply

I assume Jujutsu only commits the file when you use one of the jj commands. I don't think it keeps a daemon running and checking for changes in the files.

3 months ago

1 reply

It does the former by default, and the latter if you configure it.

3 months ago

3 replies

I have heard of jj. I have tried jj, I love jj but I couldn't get myself towards using it.

This itself seems to me the thing which will make me push towards jj.

So if I am correct, you are telling me that I can have jj where I can then write anything in the project and it can sort of automatically record it to jj and afterwards by just learning some more about jj, I can then use that history to create a sane method for me to create git commits and do other thing without having to worry too much.

Like I like git but it scares me a little bit, having too many git commits would scare me even further but I would love to use jj if it can make things less scary

Like what would be the command / exact workflow which I am asking in jj and just any details since I am so curious about it. I have also suffered so much of accidentally deleting files or looking through chat logs if I was copy pasting from chatgpt for some one off scripts and wishing for a history of my file but not wanting git everytime since it would be more friction than not of sorts...

3 months ago

1 reply

Happy to talk about it, for sure :)

> you are telling me that I can have jj where I can then write anything in the project and it can sort of automatically record it to jj

By default, yes, jj will automatically record things into commits. There's no staging area, so no git add, stuff like that. If you like that workflow, you can do it in jj too, but it's not a special feature like it is in git.

> and afterwards by just learning some more about jj, I can then use that history to create a sane method for me to create git commits and do other thing without having to worry too much.

Yep. jj makes it really easy to chop up history into whatever you'd like.

> I would love to use jj if it can make things less scary

One thing that jj has that makes it less scary is jj undo: this is an easy to use form of the stuff I'm talking about, where it just undoes the last change you made. This makes it really easy to try out jj commands, if it does something you don't like, you can just jj undo and things will go back to the way before. It's really nice for learning.

> Like what would be the command / exact workflow which I am asking in jj

jj gives you a ton of tools to do this, so you can do a lot of different things. However, if what you want is "I want to just add a ton of stuff and then break it up into smaller commits later," then you can just edit your files until you're good to go, and then run 'jj split' to break your current diff into two. You'd break off whatever you want to be in the first commit, and then run it again to break off whatever you'd want into the second commit, until you're done.

If you are worried about recovering deleted files, the best way to be sure would to be using the watchman integration: https://jj-vcs.github.io/jj/latest/config/#watchman this would ensure that when you delete the file, jj notices. Otherwise, if you added a file, and then deleted it, and never ran a jj comamnd in between, jj isn't going to notice.

Then, you'd run `jj evolog`, and find the id of the change right before you deleted the file. Let's pretend that's abc123. You can then use `jj restore` to bring it back:

  jj restore --from abc123 -- path/to/file

This says "I want to bring back the version of /path/to/file from abc123, and since that's the one before it was deleted, you'd get it back as you had it.

I tend to find myself not doing this a ton, because I prefer to make a ton of little changes up front, which just means running 'jj new' at any point i want to checkpoint things, and then later squashing them together in a way that makes sense. This makes this a bit easier, because you don't need to read through the whole evolog, you can just look at a parent change. But since this is about restoring something you didn't realize you deleted, this is the ultimate thing you'd have to do in the worst case.

BeetleB

3 months ago

I can second that `jj undo` is awesome!

baq

3 months ago

I can’t see myself going back to git after I actually went back and was very confused for a second I need to stash before rebase.

BeetleB

3 months ago

> I can then use that history to create a sane method for me to create git commits and do other thing without having to worry too much.

It's easier than that. Your jj commits are the commits that will be pushed - not all the individual git commits.

Conceptually, think of two types of commits: jj and git. When you do `jj new`, you are creating a jj commit.[1] While working on this, every time you run a command like `jj status`, it will make a git commit, without changing the jj commit. When you're done with the feature and type `jj new` again, you now have two jj commits, and many, many git commits.[2] When you do a `jj git push`, it will send the jj commits, without all the messiness of the git commits.

Technically, the above is inaccurate. It's all git commits anyway. However, jj lets you distinguish between the two types of commits: I call them coarse and fine grained commits. Or you can think hierarchically: Each jj commit has its own git repository to track the changes while you worked on the feature.[2]

So no, you don't need to intentionally use that history to create git commits. jj should handle it all for you.

I think you should go back to it and play some more :-)

[1] changeset, whatever you want to call it.

[2] Again - inaccurate, but useful.

3 months ago

> (Side ask to people using Jujutsu: isn't it a use case where jujutsu shines?)

Yes! For the case discussed in the article, I actually just wrote a comment yesterday on lobsters about the 'evolog': https://lobste.rs/s/xmlpu8/saving_my_commit_with_jj_evolog#c...

Basically, jj will give you a checkpoint every time you run a jj command, or if you set up file watching, every time a file changes. This means you could recover this sort of thing, assuming you'd either run a commend in the meantime or had turned that on.

Beyond that, it is true in my experience that jj makes it super easy to commit early, commit often, and clean things up afterwards, so even though I was a fan of doing that in git, I do it even more with jj.

BryantD

3 months ago

3 replies

I think maybe one other lesson, although I certainly agree with yours, and with the other commenters who talk about the unreliability of this particular method? This feels like an argument for using an editor that autosaves history. "Disk is cheap," as they say -- so what if your undo buffer for any given file goes back seven days, or a month? With a good interface to browse through the history?

I'm sure there's an emacs module for this.

compiler-guy

3 months ago

1 reply

Most of Google's internal development happens on a filesystem that saves absolutely everything in perpetuity. If you save it, the snapshot lives on forever--deleting stuff requires filing a ticket with justification. It is amazingly useful and has saved me many times. Version control is separate, but built on it.

When I eventually move on, I will likely find or implement something similar. It is just so useful.

julian_t

3 months ago

I used to rely on this on the old DEC systems, when editing and saving foo.dat;3 gave you foo.dat;4. It didn't save everything forever - and you could PURGE older versions - but it saved enough to get me out of trouble many times.

mosdl

3 months ago

Jetbrain's local history has saved my bacon several times. Its a good use of all that unused disc space.

vidarh

3 months ago

My editor syncs the total editing state to disk every few minutes, across all open instances.

I eventually added support for killing buffers, but I rarely do (only if there's stuff I need to purge for e.g. liability reasons). After a few years use, I now have 5726 buffers open (just checked).

I guess I should garbage collect this at some point and/or at least migrate it from the structure that gets loaded on every startup (it's client-server, so this only happens on reboot, pretty much), but my RAM has grown many times faster than my open buffers.

bogzz

3 months ago

1 reply

if you like it, then you shoulda git commit on it

bdangubic

3 months ago

do it even if you don't like it :)

superxpro12

3 months ago

1 reply

Emergency Building Fire Procedure:

1. Commit 2. Push 3. Evacuate

rossant

3 months ago

1. Commit with a full, detailed, well-formatted commit message, as always.

beefnugs

3 months ago

The lesson is to trust microsoft Recall.

"Hey copilot, what are all my passwords and credit card numbers"

zobzu

3 months ago

hopefully most folks understand that its tongue-in-cheek.

with that said its true that it works =)

everyone

3 months ago

2 replies

Ok great, now u have retrieved that code that you dont even understand and is completely unmaintainable.

3 months ago

1 reply

If you're an engineer it can be quite shocking to see how people like the author work. It's much more like science than engineering. A lot of trial and error and swapping things around without fully understanding the implications etc. It doesn't interest me, but it's how all the best results are obtained in ML as far as I can tell.

RA_Fisher

3 months ago

I'm a scientist and I'd never work that way. I'm methodical, because I've learned it's the fastest and highest-ROI approach.

Guessing without understanding is extremely unlikely to produce the best results in a repeatable manner. It's surprising to me when companies don't know that. For that reason, I generally want to work with experts that understand what they're doing (otherwise is probably a waste of time).

big_hacker

3 months ago

Agreed. Most plausible reason they "can't remember" the good solution is because they were vibe coding and didn't really understand what they were doing. Research mode my ass.

3 months ago

4 replies

I cannot wrap my head around the anecdote that opens the article:

> Lately I’ve heard a lot of stories of AI accidentally deleting entire codebases or wiping production databases.

I simply... I cannot. Someone let a poorly understood AI connected to prod, and it ignored instructions, deleted the database, and tried to hide it. "I will never use this AI again", says this person, but I think he's not going far enough: he (the human) should be banned from production systems as well.

This is like giving full access to production to a new junior dev who barely understands best practices and is still in training. This junior dev is also an extraterrestrial with non-human, poorly understood psychology, selective amnesia and a tendency to hallucinate.

I mean... damn, is this the future of software? Have we lost our senses, and in our newfound vibe-coding passion forgotten all we knew about software engineering?

Please... stop... I'm not saying "no AI", I do use it. But good software practices remain as valid as ever, if not more!

graemep

3 months ago

2 replies

Its a matter of priorities. Its cheap and fast and there is a chance that it will be OK. Even just OK until I move on. People often make risky choices for those reasons. Not just with IT systems - the crash of 2008 was largely the result of people betting (usually correctly) that the wheels would not fall off until after they had collected a few years of bonuses.

f1shy

3 months ago

1 reply

I do not know who is doing the math, but deleting production data does not sound very cheap to me...

graemep

3 months ago

No, but the decision is taken on the basis that it probably will not happen, and if it does there is a good chance that the person taking the decision will not be the one to bear the consequences.

That is why I chose to compare it to the 2008 crash. The people who took the decisions to take the risks that lead to it came out of it OK.

3 months ago

OK, if it's a matter of priorities, let's just ignore all hard learned lessons in software engineering, and vibe-code our way through life, crossing fingers and hoping for the best.

Typing systems? Who needs them, the LLM knows better. Different prod, dev, and staging environments? To hell with them, the LLM knows better. Testing? Nope, the LLM told me everything's sweet.

(I know you're not saying this, I'm just venting my frustration. It's like the software engineering world finally and conclusively decided engineering wasn't necessary at all).

f1shy

3 months ago

2 replies

99% agree, but:

>(the human) should be banned from production systems as well.

The human may have learnt the lesson... if not, I would still be banned ;)[0]

[0] I did not delete a database, but cut power to the rack running the DB

recursive

3 months ago

I cut the power... but I did not drop the database.

3 months ago

I don't think it's the same. I'm not arguing you must not make mistakes, because all of us do.

I mean: if you're a senior, don't connect a poorly understood automated tool to production, give it the means to destroy production, and (knowing they are prone to hallucinations) then tell it "but please don't do it unless I tell you to". As a fun thought experiment, imagine this was Skynet: "please don't start nuclear war with Russia. We have a simulation scenario, please don't confuse it with reality. Anyway, here are the launch codes."

Ignoring all software engineering best practices is a junior-level mistake. If you're a senior, you cannot be let off the hook. This is not the same as tripping on a power cable or accidentally running a DROP in production when you thought you were in testing.

Aurornis

3 months ago

4 replies

The common story getting shared all over is from a guy named Jason Lemkin. He’s a VC who did a live vibe-coding experiment for a week on Twitter where he wanted to test if he, a non-programmer, could build and run a fake SaaS by himself.

The AI agent dropped the “prod” database, but it wasn’t an actual SaaS company or product with customers. The prod database was filled with synthetic data.

The entire thing was an exercise but the story is getting shared everywhere without the context that it was a vibe coding experiment. Note how none of the hearsay stories can name a company that suffered this fate, just a lot of “I’m hearing a lot of stories” that it happened.

It’s grist for the anti-AI social media (including HN) mill.

Aeolun

3 months ago

Claude has dropped my dev database about three times at this point. I can totally see how it would drop a prod one if connected to it.

3 months ago

I generally agree with you, but I think a lot of people are thinking about Steve Yegge, in addition to Jason Lemkin. And it did lock him out of his real prod database.

3 months ago

OK, ok, I read the Twitter posts and didn't get the full context that this was an experiment.

I'm actually relieved that nobody (currently) thinks this was a good idea.

You've restored my faith in humanity. For now.

Lerc

3 months ago

Is there a need for a wiki to collect instances of these stories? Having a place to check what claims are actually being made rather than details drift like an urban legend.

I tried to determine the origin of a story about a family being poisoned by mushrooms that an AI said were edible. The nation seemed to change from time to time and I couldn't pin down the original source. I got the feeling it was an imagined possibility from known instances of AI generated mushroom guides.

There seems to cases of warnings of what could happen that change to "This Totally Happened" behind a paywall followed by a lot of "paywalled-site reported this totally happened".

qwertytyyuu

3 months ago

1 reply

I mean everyone breaks prid at least once, ai is just one that doesn’t learnt from the mistake

3 months ago

To be clear, I'm not saying "don't make mistakes", because that's impossible.

I'm merely saying "don't play (automated) Russian roulette".

Flavius

3 months ago

1 reply

Who needs anything when you can keep everything in a 16 TBs txt file?

https://en.wikipedia.org/wiki/Sun_Ray

3 months ago

There's probably a meme in here somewhere about how we wrapped your JSON in our JSON so we can parse JSON to help you parse your JSON.

ed_elliott_asc

3 months ago

1 reply

I’m completely paranoid about claude messing with my .git folder so I push regularly

tripplyons

3 months ago

For the same reason, I run OpenCode under Mac's sandbox-exec command with some rules to prevent writes to the .git folder or outside of the project (but allowing writes to the .cache and opencode directories).

sandbox-exec -p "(version 1)(allow default)(deny file-write* (subpath \"$HOME\"))(allow file-write* (subpath \"$PWD\") (subpath \"$HOME/.local/share/opencode\"))(deny file-write* (subpath \"$PWD/.git\"))(allow file-write* (subpath \"$HOME/.cache\"))" /opt/homebrew/bin/opencode

mobeigi

3 months ago

2 replies

I'm waiting for the day someone builds a wrapper around LLM chats and uses it as a storage medium. It's already been done for GitHub, YouTube videos and Minecraft.

tripplyons

3 months ago

For Minecraft did they just write text in a book item?

Aurornis

3 months ago

I suppose if you want an extremely lossy storage medium that may or may not retrieve your data, stores less than a 3.5” storage medium, and needs to be continually refreshed as you access it.

yggdrasil_ai

3 months ago

1 reply

If you sent the python file to Gemini, wouldn't it be in your database for the chat? I don't think relying on uncertain context window is even needed here!

A big goal while developing Yggdrasil was for it to act as long term documentation for scenarios like you describe!

As LLM use increases, I imagine each dev generating so much more data than before, our plans, considerations, knowledge have almost been moved partially into the LLM's we use!

You can check out my project on git, still in early and active development - https://github.com/zayr0-9/Yggdrasil

lazyfanatic42

3 months ago

1 reply

I confused it with this for a min, which I have played with: https://github.com/yggdrasil-network/yggdrasil-go

yggdrasil_ai

3 months ago

Browsing that repo is a bit trippy after being used to my own all this time haha.

big-and-small

3 months ago

3 replies

1M context is amazing, but even after 100k tokens Gemini 2.5 Pro is usually incapable of consistently reproducing 300 LOC file without changing something in process. And it actually take a lot of effort to make sure it do not touch files it not suppose to.

Lerc

3 months ago

With Gemini I have found some weird issues with code gen that are presumably temperature related. Sometimes it will emit large block of code with a single underscore where it should be a dash or some similar very close match that would make sense as a global decision but is triggered for only that one instance.

like code containing the same identifier.

    add_value(thing,other_thing)
    add_value(thing,other_thing) 
    add_value(thing,other_thing) 
    add_value(thing,other_thing)
    add-value(thing,other_thing)
    add_value(thing,other_thing) 
    add_value(thing,other_thing)

Not to mention sneakily functions back in after being told to remove them because they are defined elsewhere. Had a spell where it was reliably a two prompt process for any change, 1) do the actual thing, 2) remove A,B and C which you have reintroduced again.

this_user

3 months ago

I have had some very weird issues with Gemini 2.5 Pro where during a longer session it eventually becomes completely confused and starts giving me the response to the previous prompt instead of the current one. I absolutely would not trust it to handle larger amounts of data or information correctly.

alansaber

3 months ago

Exactly, 1M context tokens is marketing, relatively little training was done at that input size.

jbentley1

3 months ago

1 reply

I use Crystal which archives all my old claude code conversations, I've had to do this a few times when I threw out code that I later realized I needed.

jasonjmcghee

3 months ago

Claude Code archives Claude Code conversations. "claude -r" - or go to equivalent of "~/.claude/sessions" or something like that

kristjansson

3 months ago

2 replies

If it’s in the context window … it’s sitting around as plain text. I guess asking is easier than scrollback?

rfw300

3 months ago

1 reply

Indeed. OP, nothing is "in" an LLM's context window at rest. The old version of your file is just cached in whatever file stores your IDE's chat logs, and this is an expensive way of retrieving what's already on your computer.

kristjansson

3 months ago

I mean there is the chance it's on someone else's computer ^W^W^W^ the cloud, and his provider of choice doesn't offer easy access to deep scrollback ... which means this is only inefficient, not inefficient and pointless.

red2awn

3 months ago

1 reply

Technically it doesn't have to be since that part of the context window would have been in the KV cache and the inference provider could have thrown away the textual input.

Cr8

3 months ago

possible - but KV caches are generally _much_ bigger than the source text and can be reproduced from the source text so it wouldn't make a lot of sense to throw it out

wseqyrku

3 months ago

6 replies

"Who needs git when [..]"

No matter how that sentence ends, I weep for our industry.

waffletower

3 months ago

3 replies

Reminds me of a colleague back in the day who would force push to main and just leave a "YOLO" comment in the commit.

shepardrtc

3 months ago

3 replies

At my last job, whenever a commit wouldn't build, we would blast it into a slack channel with an alert that included the comment and the name of the developer.

vlovich123

3 months ago

1 reply

Ah yes. Public shaming. The “beatings will continue until morale improves” strategy of code development. Next time, you may want to suggest an evergreen strategy where commits are tested before they’re merged.

baq

3 months ago

1 reply

‘Works on my local’

vlovich123

3 months ago

No. Evergreen means CI tests your commit, not relying on individuals to be doing before pushing.

NordSteve

3 months ago

At one job, we had a garish chicken hat that lived in your office if you were the last one to break the build.

This was in the days before automated CI, so a broken commit meant that someone wasn't running the required tests.

arealaccount

3 months ago

We used to have a git plugin that snaps a picture on every push, which accompanied the "alert". Was fun.

0xffff2

3 months ago

1 reply

It's mind-blowing to me that any multi-user git repo is set up to allow pushes to main at all.

waffletower

3 months ago

It was years ago at a small outfit

nicman23

3 months ago

what a chad

bgwalter

3 months ago

4 replies

That's nothing. Compare https://xcancel.com/elonmusk/status/1956583412203958733 :

"The phone/computer will just become an edge node for AI, directly rendering pixels with no real operating system or apps in the traditional sense."

kens

3 months ago

1 reply

Sounds like the Sun Ray thin client, built by Sun Microsystems in 1999. This was similar to the earlier graphical X terminals, which were reminiscent of mainframe terminals in the 1960s. It's the "wheel of reincarnation".

3 months ago

1 reply

Super cool! What I am wondering is if there is any interest in lets say having a smartphone that has this tech(see my other comment wishing for a open source phone somewhere on hackernews or the internet really)

So lets say we can just have a really lightweight customizable smartphone which just connects over wifi or wire to something like raspberry pi or any really lightweight/small servers which you can carry around and installing waydroid on it could make a really pleasant device and everything could be completely open source and you can make things modular if you want...

Like, maybe some features like music/basic terminal and some other things can be seen from the device too via linux x and anything else like running android apps calls up the server which you can carry around in a backpack with a powerbank

If I really wanted to make it extremely ideal, the device can have a way of plugging another power bank and then removing the first powerbank while still running the system so that it doesn't shut down and you literally got a genuinely fascinating system that is infinitely modular.

Isn't this sort of what stadia was? But just targeted more for gaming side since Games requires gpu's which are kinda expensive...

What are your thoughts? I know its nothing much but I just want a phone which works for 90% of tasks which lets be honest could be done through a really tiny linux or sun ray as well and well if you need something like an android app running, be prepared for a raspberry pi with a cheap battery running in your pocket. Definitely better than creating systems of mass surveillance but my only nitpick of my own idea would be that it may be hard to secure the communication aspect of it if you use something like wifi but I am pretty sure that we can find the perfect communication method too and it shouldn't be thaaat big of a deal with some modifications right?

theblazehen

3 months ago

1 reply

I've been wanting to do that. I'm not sure of the status of remote display forwarding for wayland, but worst case you can run postmarketos on your device, then `ssh -X yourhost weston` to start a weston compositor which displays over X, and then run waydroid inside there.

The bulkiness of having a powerbank + rpi with you could get a little challenging to deal with

3 months ago

Hey I like your approach too but it seems that there are some subtle differences in our approaches

I mean that I take a screen and a esp32 or any microcontroller like raspberry pi and create a modular phone for just enough to boot from a device in my backpack lets say

And what you are saying is to take an already working phone and then running postmarketos on it to then connect to a host

Theoretically... (yes?) Postmarketos is a linux but their support is finnicky from what I know... like it scares me or makes me think I need a really specific phone which might cost a lot of sorts or comparatively more than say my modular approach

Everything else sure, they are the same.

I believe that the microcontroller approach isntead of postmarketos can be better because of more freedom of the amount of Os supported but that isn't that big of a deal

just searched and somebody has created something very similar to my ideal https://hackaday.com/2023/08/03/open-source-cell-phone-based...

just plug in a ssh server from raspberry pi of sorts and a wifi card to connect them of sorts :)

Now If you are wanting to do it, Do you want to contribute together? I will send you a mail after which we can talk on something like signal or feel free to message me on signal and anything else really!

If I can be honest, I want to hack around with my kaechoda 100 which worked with 32 mb... like it never lagged in 32 mb and my 1 gig android stutters and I definitely want to figure out what OS does kaechoda use that its so so fast and actually good-enough as well

Like anyways, I will message ya and if anybody is an expert in embedded, please also contact me if someone else is also interested like you! I genuinely want to make this a reality and write more about it :p

have a nice day and I will send you a mail to your gmail!

3 months ago

3 replies

What's ridiculous is that second paragraph:

"There isn’t enough bandwidth to transmit video to all devices from the servers and there won’t always be good connectivity, so there still needs to be significant client-side AI compute."

So no real operating system, except an AI which operates the whole computer including all inputs and outputs? I feel like there's a word for that.

pyuser583

3 months ago

1 reply

What's the word? "Robot?"

https://www.hypori.com/virtual-workspace-byod

3 months ago

1 reply

Well, if the AI controls the computer and is how the user interacts with it, I was going to use "Operating System" myself. But that's two words, my bad.

pyuser583

3 months ago

Operating Systems exist to manage access to resources - which requires a very different sort of training than user interface AI's.

Computer chips already use ai/machine learning to guess what the next instructions are going to be. You could have the kernel do similar guessing.

But I don't think those AI's would be the same ones that write love letters.

I think what we'll see is LLM's for people facing things, and more primitive machine learning for resource management (we already have it).

Sorry, I'm partially responding you and partially to this thread in general.

fragmede

3 months ago