Using Llms at Oxide
Key topics
The Oxide company's exploration of using Large Language Models (LLMs) in their workflow has sparked a lively debate about the benefits and drawbacks of AI-assisted coding. While some commenters, like gghffguhvc, argue that LLMs are a rational choice if they reduce overall costs, others, such as monkaiju, are puzzled by Oxide's encouragement of LLM use despite acknowledging significant caveats. The discussion highlights the importance of human oversight, with devmor noting that the onus is on the user to ensure LLMs perform correctly, and zihotki pointing out that seniority and experience play a crucial role in effectively utilizing LLMs. As ahepp and sunshowers share their personal experiences with AI-assisted coding, the conversation turns to the need for research on the impact of LLMs on code quality, with Yeask provocatively asking why large companies aren't already investigating this.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
22m
Peak period
56
0-3h
Avg / period
14.5
Based on 160 loaded comments
Key moments
- 01Story posted
Dec 6, 2025 at 8:17 PM EST
about 1 month ago
Step 01 - 02First comment
Dec 6, 2025 at 8:39 PM EST
22m after posting
Step 02 - 03Peak activity
56 comments in 0-3h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 8, 2025 at 9:45 AM EST
about 1 month ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Naturally this doesn’t factor in things like human obsolescence, motivation and self-worth.
I've been thinking about this as I do AoC with Copilot enabled. It's been nice for those "hmm how do I do that in $LANGUAGE again?" moments, but it's also wrote some nice looking snippets that don't do quite what I want it to. And many cases of "hmmm... that would work, but it would read the entire file twice for no reason".
My guess, however, is that it's a net gain for quality and productivity. Humans make bugs too and there need to be processes in place to discover and remediate those regardless.
I'm currently trying out using Opus 4.5 to take care of a gnarly code reorganization that would take a human most of a week to do -- I spent a day writing a spec (by hand, with some editing advice from Claude Code), having it reviewed as a document for humans by humans, and feeding it into Opus 4.5 on some test cases. It seems to work well. The spec is, of course, in the form of an RFD, which I hope to make public soon.
I like to think of the spec is basically an extremely advanced sed script described in ~1000 English words.
I don’t think it is easy to create a concise set of rules to apply in this gap for something as general as LLM use, but I do think such a ruleset is noticeably absent here.
The document includes statements like "LLMs are superlative at reading comprehension", "LLMs can be excellent editors", "LLMs are amazingly good at writing code".
The caveats are really useful: if you've anchored your expectations on "these tools are amazing", the caveats bring you closer to what they've observed.
Or, if you're anchored on "the tools aren't to be used", the caveats give credibility to the document's suggestions of the LLMs are useful for.
There are things in life that have high risks of harm if misused yet people still use them because there are great benefits when carefully used. Being aware of the risks is the key to using something that can be harmful, safely.
> it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open.
I think Oxide didn't include this in the RFD because they exclusively hire senior engineers, but in an organization that contains junior engineers I'd add something specific to help junior engineers understand how they should approach LLM use.
Bryan has 30+ years of challenging software (and now hardware) engineering experience. He memorably said that he's worked on and completed a "hard program" (an OS), which he defines as a program you doubt you can actually get working.
The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so. That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.
Years ago I had to spend many months building nothing but Models (as in MVC) for a huge data import / ingest the company I worked on was rewriting. It was just messy enough that it couldn't be automated. I almost lost my mind from the dull monotony and started even having attendance issues. I know today that could have been done with an LLM in minutes. Almost crazy how much time I put into that project compared to if I did it today.
But a junior engineer would never find/anticipate those issues.
I am a bit concerned. Because the kind of software I am making, a llm would never prompt on its own. A junior cannot make it, it requires research and programming experience that they do not have. But I know that if I were a junior today, I would probably try to use llms as much as possible and would probably know less programming over time.
So it seems to me that we are likely to have worse software over time. Perhaps a boon for senior engineers but how do we train junior devs in that environment? Force them to build slowly, without llms? Is it aligned with business incentives?
Do we create APIs expecting the code to be generated by LLMs or written by hand? Because the impact of verbosity is not necessarily the same. LLMs don't get tired as fast as humans.
Obviously if it's anything even minorly complex you can't trust the LLM hasn't found a new way to fool you.
For a junior in the learning phase that can be useful time spent. Then again, I agree that at times certain menial code tasks are not worth doing and llms are helpful.
It's a bit like a kid not spending time memorizing their time tables since they can use a calculator. They are less likely to become a great mathematician.
So of course it’s going to generate code that has non-obvious bugs in it.
Ever play the Undefined Behaviour Game? Humans are bad at being compilers and catching mistakes.
I’d hoped… maybe still do, that the future of programming isn’t a shrug and, “good enough.” I hope we’ll keep developing languages and tools that let us better specify programs and optimize them.
IMO, it's already happening. I had to change some personal information on a bunch of online services recently, and two out of seven of them were down. One of them is still down, a week later. This is the website of a major utilities company. When I call them, they acknowledge that it's down, but say my timing is just bad. That combined with all the recent outages has left me with the impression that software has been getting (even more) unreliable, recently.
> just messy enough that it couldn't be automated.
> I know today that could have been done with an LLM in minutes.
LLM's are amazing technology, but this is a terrible task for them.
This is a task where exactness is the whole effort, even though it's mind-numbingly boring, and LLM's are the worst of all computational tools you could leverage against "exacting but exhausting"
You'd think there's some technology that could have helped you. There probably is. LLM's, almost by definition, are not that technology.
This is very close to "count the r's in strawberry" and is a nearly the worst thing you could task one to do.
It was only then did she introduce us to the glory that was Adobe Dreamweaver, which (obviously) increased our productivity tenfold.
My first PHP scripts and games were written using nothing more than Notepad too funnily enough
I'm not saying they don't have their place, but without us they would still be making the world go round. Only backwards.
It’s lovely to have the time to do that. This time comes once the other type of engineer has shipped the product and turned the money flow on. Both types have their place.
I think what craftsmen miss is the different goals. Projects fall on a spectrum from long lived app that constantly evolve with a huge team working on it to not opened again after release. In the latter, like movie or music production (or most video games), only the end result matters, the how is not part of the final product. Working for years with designers and artists really gave me perspective on process vs end result and what matter.
That doesn’t mean the end result is messy or doesn’t have craftsmanship. Like if you call a general contractor or carpenter for a specific stuff, you care that the end result is well made, but if they tell you that they built a whole factory for your little custom made project (the equivalent of a nice codebase), not only it doesn’t matter for you but it’ll be wildly overpriced and delayed. In my agency that means the website is good looking and bug free after being built, no matter how messy is the temporary construction site.
In contrast if you work on a SaaS or a long lived project (e.g. an OS) the factory (the code) is the product.
So to me when people say they are into code craftsmanship I think they mean in reality they are more interested in factory building than end product crafting.
Say it takes 2 hours to implement a feature, and another hour making it logically/architecturally correct. You bill $600 and eat $200 for goodwill and your own personal/organizational development. You're still making $200/hr and you never find yourself in meetings with normie clients about why refactoring, cohesiveness, or quality was necessary.
I think the sweet spot is to strive for code that is easy to read and understand, easy to change, and easy to eventually replace or throw out. Obviously performant enough but yadda yadda premature optimization, depends on the domain and so on...
I don’t particularly remember why, but “hand writing” fancy HTML and CSS used to be a flex in some circles in the 90s. A bunch of junk and stuff like fixed positioning in the source was the telltale sign they “cheated” with FrontPage or Dreamweaver lol
The _vti_cnf dir left /etc/passwd downloadable, so I grabbed it from my school website. One Jack the Ripper later and the password was found.
I told the teacher resposible for the IT it was insecure and that ended up getting me some work experience. Ended up working the summer (waiting for my GCSE results) for ICL which immeasurably helped me when it was time to properly start working.
Did think about defacing, often wonder that things could have turned out very much differently!
Dreamweaver was to web development what ...
I just sat here for 5 minutes and I wasn't able to finish that sentence. So I think that's a statement in itself.
People with very little competence could and did get things done, but it was a mess underneath.
https://developer.adobe.com/dreamweaver/
And yes, as you can imagine for the kind of comments I do regarding high level productive tooling and languages, I was a big Dreamwever fan back in the 2000's.
This gives me somewhat of a knee jerk reaction.
When I started programming professionally in the 90s, the internet came of age and I remember being told "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer due to the sheer size of knowledge required today to produce a meaningful product. It's too big and it moves too fast.
There was this long argument that you should know things and not have to look it up all the time. Altavista was a joke, and Google was cheating.
Then syntax highlighting came around and there'd always be a guy going "yeah nah, you shouldn't need syntax highlighting to program, you screen looks like a Christmas tree".
Then we got stuff like auto-complete, and it was amazing, the amount of keystrokes we saved. That too, was seen as heresy by the purists (followed later by LSP - which many today call heresy).
That reminds me also, back in the day, people would have entire Encyclopaedia on DVDs collections. Did they use it? No. But they criticised Wikipedia for being inferior. Look at today, though.
Same thing with LLMs. Whether you use them as a powerful context based auto-complete, as a research tool faster than wikipedia and google, as rubber-duck debugger, or as a text generator -- who cares: this is today, stop talking like a fossil.
It's 2025 and junior developers can't work without LSP and LLM? It's fine. They're not in front of a 386 DX33 with 1 book of K&R C and a blue EDIT screen. They have massive challenged ahead of them, the IT world is complete shambles, and it's impossible to decipher how anything is made, even open source.
Today is today. Use all the tools at hand. Don't shame kids for using the best tools.
We should be talking about sustainability of such tools rather than what it means to use them (cf. enshittification, open source models etc.)
The Internet itself is full of distractions. My younger self spent a crazy amount of time on IRC. So it's not different than spending time on say, Discord today.
LLMs have pretty much a direct relationship with Google. The quality of the response has much to do with the quality of the prompt. If anything, it's the overwhelming nature of LLMs that might be the problem. Back in the day, if you had, say a library access, the problem was knowing what to look for. Discoverability with LLMs is exponential.
As for LLM as auto-complete, there is an argument to be made that typing a lot reinforces knowledge in the human brain like writing. This is getting lost, but with productivity gains.
Tools like Claude code with ask/plan mode seem to be better in my experience, though I absolutely do wonder about the lack of typing causing a lack of memory formation
A rule I set myself a long time ago was to never copy paste code from stack overflow or similar websites. I always typed it out again. Slower, but I swear it built the comprehension I have today.
For interns/junior engineers, the choice is: comprehension VS career.
And I won't be surprised if most of them will go with career now, and comprehension.. well thanks maybe tomorrow (or never).
It shouldn't be, but it is.
That's not an LLM problem, they'd do the same thing 10 years ago with stack overflow: argue about which answer is best, or trust the answer blindly.
Normal auto complete plus a code tool like Claude Code or similar seem far more useful to me.
I have the same policy. I do the same thing for example code in the official documentation. I also put in a comment linking to the source if I end up using it. For me, it’s like the RFD says, it’s about taking responsibility for your output. Whether you originated it or not, you’re the reason it’s in the codebase now.
Nowadays I'm back to a text editor rather than an IDE, though fortunately one with much more creature comforts than n++ at least.
I'm glad I went down that path, though I can't say I'd really recommend as things felt a bit simpler back then.
Thats comparison undermines the integrity of the argument you are trying to make.
But I mean, you can get by without memorizing stuff sure, but memorizing stuff does work out your brain and does help out in the long run? Isn't it possible we've reached the cliff of "helpful" tools to the point we are atrophying enough to be worse at our jobs?
Like, reading is surely better for the brain than watching TV. But constant cable TV wasn't enough to ruin our brains. What if we've got to the point it finally is enough?
it isn't hilarious, it's true. My father (now in his 60s) who came from a blue collar background with very little education taught himself programming by manually copying and editing software out of magazines, like a lot of people his age.
I teach students now who have access to all the information in the world but a lot of them are quite literally so scatterbrained and heedless anything that isn't catered to them they can't process. Not having working focus and memory is like having muscle atrophy of the mind, you just turn into a vegetable. Professors across disciplines have seen decline in student abilities, and for several decades now, not just due to LLMs.
Reading books was never about knowledge. It was about knowhow. You didn't need to read all the books. Just some. I don't know how many developers I met who would keep asking questions that would be obvious to anyone who had read the book. They never got the big picture and just wasted everyone's time, including their own.
"To know everything, you must first know one thing."
He surely has his fly closed when cutting through the hype with reflection and pragmatism (without the extreme positions on both sides often seen).
I wonder which of these camps are right.
For novices, LLMs are infinitely patient rubber ducks. They unstick the stuck; helping people past the coding and system management hurdles that once required deep dives through Stack Overflow and esoteric blog posts. When an explanation doesn’t land, they’ll reframe until one does. And because they’re confidently wrong often enough, learning to spot their errors becomes part of the curriculum.
For experienced engineers, they’re tireless boilerplate generators, dynamic linters, and a fresh set of eyes at 2am when no one else is around to ask. They handle the mechanical work so you can focus on the interesting problems.
The caveat for both: intentionality matters. They reward users who know what they’re looking for and punish those who outsource judgment entirely.
Where it struggles: problems requiring taste or judgment without clear right answers. The LLM wants to satisfy you, which works great for 'make this exploit work' but less great for 'is this the right architectural approach?'
The craftsman answer might be: use LLMs for the systematic/tedious parts (code generation, pattern matching, boilerplate) while keeping human judgment for the parts that matter. Let the tool handle what it's good at, you handle what requires actual thinking.
It's a brilliant skewering of the 'em dash means LLM' heuristic as a broken trick.
1. https://www.scottsmitelli.com/articles/em-dash-tool/
This is a key difference. I've been writing software professionally for over two decades. It took me quite a long time to overcome certain invisible (to me) hesitations and objections to using LLMs in sdev workflows. At some point the realization came to me that this is simply the new way of doing things, and from this point onward, these tools will be deeply embedded in and synonymous with programming work. Recognizing this phenomenon for what it is somehow made me feel young again -- perhaps that's just the crust breaking around a calcified grump, but I do appreciate being able to tap into that all the same.
I was hoping he'd make the leaderboard, but perhaps the addiction took proper hold in more recent years:
https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...
https://news.ycombinator.com/user?id=bcantrill
No doubt his em dashes are legit, of course.
"lack of conviction" would be a useful LLM metric.
> This can be extraordinarily powerful for summarizing documents — or of answering more specific questions of a large document like a datasheet or specification.
That dash shouldn't be there. That's not a parenthetical clause, that's an element in a list separated by "or." You can just remove the dash and the sentence becomes more correct.
British users regularly use that sort of construct with "-" hyphens, simply because they're pretty much the same and a whole lot easier to type on a keyboard.
However, I was surprised to see that when someone (not me) accused him of using an LLM to write his comment, he flatly denied it: https://news.ycombinator.com/item?id=46011964
Which I guess means (assuming he isn't lying) if you spend too much time interacting with LLMs, you eventually resemble one.
Pretty much. I think people who care about reducing their children's exposure to screen time should probably take care to do the same for themselves wrt LLMs.
> First, to those who can recognize an LLM’s reveals (an expanding demographic!), it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open. But there are deeper problems: LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well. If the prose is automatically generated, might the ideas be too? The reader can’t be sure — and increasingly, the hallmarks of LLM generation cause readers to turn off (or worse).
> Specifically, we must be careful to not use LLMs in such a way as to undermine the trust that we have in one another
> our writing is an important vessel for building trust — and that trust can be quickly eroded if we are not speaking with our own voice
This is a technical document that is useful in illustrating how the guy who gave a talk once that I didn’t understand but was captivated by and is well-respected in his field intends to guide his company’s use of the technology so that other companies and individual programmers may learn from it too.
I don’t think the objective was to take any outright ethical stance, but to provide guidance about something ostensibly used at an employee’s discretion.
This applies to natural language, but, interestingly, the opposite is true of code (in my experience and that of other people that I've discussed it with).
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
https://www.laws-of-software.com/laws/kernighan/
Reading good code can be a better way to learn about something than reading prose. Writing code like that takes some real skill and insight, just like writing clear explanations.
My general procedure for using an LLM to write code, which is in the spirit of what is advocated here, is:
1) First, feed in the existing relevant code into an LLM. This is usually just a few source files in a larger project
2) Describe what I want to do, either giving an architecture or letting the LLM generate one. I tell it to not write code at this point.
3) Let it speak about the plan, and make sure that I like it. I will converse to address any deficiencies that I see, and I almost always do.
4) I then tell it to generate the code
5) I skim & test the code to see if it's generally correct, and have it make corrections as needed
6) Closely read the entire generated artifact at this point, and make manual corrections (occasionally automatic corrections like "replace all C style casts with the appropriate C++ style casts" then a review of the diff)
The hardest part for me is #6, where I feel a strong emotional bias towards not doing it, since I am not yet aware of any errors compelling such action.
This allows me to operate at a higher level of abstraction (architecture) and remove the drudgery of turning an architectural idea into written, precise, code. But, when doing so, you are abandoning those details to a non-deterministic system. This is different from, for example, using a compiler or higher level VM language. With these other tools, you can understand how they work and rapidly have a good idea of what you're going to get, and you have robust assurances. Understanding LLMs helps, but thus not to the same degree.
I always wonder about the people who say LLMs save them so much time: Do you just accept the edits they make without reviewing each and every line?
Anything that involves math or complicated conditions I take extra time on.
I feel I’m getting code written 2 to 3 times faster this way while maintaining high quality and confidence
1. Keeping the context very small 2. Keeping the scope of the output very small
With the added benefit of keeping you in the flow state (and in my experience making it more enjoyable).
To anyone that even hates LLMs give autocomplete a shot (with a keying to toggle it if it annoys you, sometimes it’s awful). It’s really no different than typing it manually wrt quality etc, so the speed up isn’t huge, but it feels a lot nicer.
It's not magic though, this still takes some time to do.
I've seen LLMs write some really bad code a few times lately it seems almost worse than what they were doing 6 or 8 months ago. Could be my imagination but it seems that way.
If you keep all edits to be driven by the LLM, you can use that knowledge later in the session or ask your model to commit the guidelines to long term memory.
Personally, I absolutely hate instructing agents to make corrections. It's like pushing a wet noodle. If there is lots to correct, fix one or two cases manually and tell the LLM to follow that pattern.
https://www.humanlayer.dev/blog/writing-a-good-claude-md
Insert before that: have it creates tasks with beads and force it to let you review before marking a task complete
You obviously cannot emotionally identify with the code you produce this way; the ownership you might feel towards such code is nowhere near what meticulously hand-written code elicits.
By this own article's standards, now there are 2 authors who don't understand what they've produced.
I think this points out a key point.. but I'm not sure the right way to articulate it.
A human-written comment may be worth something, but an LLM-generated is cheap/worthless.
The nicest phrase capturing the thought I saw was: "I'd rather read the prompt".
It's probably just as good to let an LLM generate it again, as it is to publish something written by an LLM.
Text, images, art, and music are all methods of expressing our internal ideas to other human beings. Our thoughts are the source, and these methods are how they are expressed. Our true goal in any form of communication is to understand the internal ideas of others.
An LLM expresses itself in all the same ways, but the source doesn't come from an individual - it comes from a giant dataset. This could be considered an expression of the aggregate thoughts of humanity, which is fine in some contexts (like retrieval of ideas and information highly represented in the data/world), but not when presented in a context of expressing the thoughts of an individual.
LLMs express the statistical summation of everyone's thoughts. It presents the mean, when what we're really interested in are the data points a couple standard deviations away from the mean. That's where all the interesting, unique, and thought provoking ideas are. Diversity is a core of the human experience.
---
An interesting paradox is the use of LLMs for translation into a non-native language. LLMs are actively being used to better express an individual's ideas using words better than they can with their limited language proficiency, but for those of us on the receiving end, we interpret the expression to mirror the source and have immediate suspicions on the legitimacy of the individual's thoughts. Which is a little unfortunate for those who just want to express themselves better.
That’s what I think when I see a news headline. What are you writing? Who cares. WHY are you writing it — that is what I want to know.
They seem to be good at either spitting out something very average, or something completely insane. But something genuinely indicative of the spark of intelligence isn’t common at all. I’m happy to know that while my thoughts are likely not original, they are at least not statistically likely.
A comment is an attempt to more fully document the theory the programmer has. Not all theory can be expressed in code. Both code and comment are lossy artefacts that are "projections" of the theory into text.
LLMs currently, I believe, cannot have a theory of the program. But they can definitely perform a useful simulacrum of such. I have not yet seen an LLM generated comment that is truly valuable. Of course, lots of human generated comments are not valuable either. But the ceiling for human comments is much, much higher.
For example, I recently was perusing the /r/SaaS subreddit and could tell that most of the submissions were obviously LLM-generated, but often by telling a story that was meant to spark outrage, resonate with the “audience” (eg being doubted and later proven right), and ultimately conclude by validating them by making the kind of decision they typically would.
I also would never pass this off as anything else, but I’ve been finding it effective to have LLMs write certain kinds of documentation or benchmarks in my repos, just so that they/I/someone else have access to metrics and code snippets that I would otherwise not have time to write myself. I’ve seen non-native English speakers write pretty technically useful/interesting docs and tech articles by translating through LLMs too, though a lot more bad attempts than good (and you might not be able to tell if you can’t speak the language)…
Honestly the lines are starting to blur ever so slightly for me, I’d still not want someone using an LLM to chat with me directly, but if someone who could have an LLM build a simple WASM/interesting game and then write an interesting/informative/useful article about it, or steer it into doing so… I might actually enjoy it. And not because the prompt was good: instructions telling an LLM to go make a game and do a write up don’t help me as much or in the same way as being able to quickly see how well it went and any useful takeaways/tricks/gotchas it uncovered. It would genuinely be giving me valuable information and probably wouldn’t be something I’d speculatively try or run myself.
> Unlike prose, however (which really should be handed in a polished form to an LLM to maximize the LLM’s efficacy), LLMs can be quite effective writing code de novo.
Don't the same arguments against using LLMs to write one's prose also apply to code? Was this structure of the code and ideas within the engineers'? Or was it from the LLM? And so on.
Before I'm misunderstood as a LLM minimalist, I want to say that I think they're incredibly good at solving for the blank page syndrome -- just getting a starting point on the page is useful. But I think that the code you actually want to ship is so far from what LLMs write, that I think of it more as a crutch for blank page syndrome than "they're good at writing code de novo".
I'm open to being wrong and want to hear any discussion on the matter. My worry is that this is another one of the "illusion of progress" traps, similar to the one that currently fools people with the prose side of things.
“Mock the world then test your mocks”, I’m simply not convinced these have any value at all after my nearly two decades of doing this professionally
It can be addressed with prompting, but you have to fight this constantly.
This is one of the problems I feel with LLM-generated code, as well. It's almost always between 5x and long and 20x (!) as long as it needs to be. Though in the case of code verbosity, it's usually not because of thoroughness so much as extremely bad style.
The more examples of different types of problems being solved in similar ways present in an LLM's dataset, the better it gets at solving problems. Generally speaking, if it's a solution that works well, it gets used a lot, so "good solutions" become well represented in the dataset.
Human expression, however, is diverse by definition. The expression of the human experience is the expression of a data point on a statistical field with standard deviations the size of chasms. An expression of the mean (which is what an LLM does) goes against why we care about human expression in the first place. "Interesting" is a value closely paired with "different".
We value diversity of thought in expression, but we value efficiency of problem solving for code.
There is definitely an argument to be made that LLM usage fundamentally restrains an individual from solving unsolved problems. It also doesn't consider the question of "where do we get more data from".
>the code you actually want to ship is so far from what LLMs write
I think this is a fairly common consensus, and my understanding is the reason for this issue is limited context window.
Basically if you are a software engineer you can very easily judge quality of code. But if you aren’t a writer then maybe it is hard for you to judge the quality of a piece of prose.
It depends on the LLM, I think. A lot of people have a bad impression of them as a result of using cheap or outdated LLMs.
A common prompt I use is approximately ”Write tests for file X, look at Y on how to setup mocks.”
This is probably not ”de novo” and in terms of writing is maybe closer to something like updating a case study powerpoint with the current customer’s data.
- I think the "if you use another model" rebuttal is becoming like the No True Scotsman of the LLM world. We can get concrete and discuss a specific model if need be.
- If the use case is "generate this function body for me", I agree that that's a pretty good use case. I've specifically seen problematic behavior for the other ways I'm seeing it OFTEN used, which is "write this feature for me", or trying to one shot too much functionality, where the LLM gets to touch data structures, abstractions, interface boundaries, etc.
- To analogize it to writing: They shouldn't/cannot write the whole book, they shouldn't/cannot write the table of contents, they cannot write a chapter, IMO even a paragraph is too much -- but if you write the first sentence and the last sentence of a paragraph, I think the interpolation can be a pretty reasonable starting point. Bringing it back to code for me means: function bodies are OK. Everything else gets questionable fast IME.
He is a long way from Sun.
111 more comments available on Hacker News