The Port I Couldn't Ship

Posted15 days agoActive5d ago

cjlm

131 points

84 comments

ammil.industriesTech DiscussionstoryHigh profile

informativeneutral

Debate

20/100

ProgrammingDeveloper ManagementTechnical Challenges

Key topics

Programming

Developer Management

Technical Challenges

The debate rages on about the future of AI-assisted programming, sparked by a tale of a "port I couldn't ship" that left many pondering the limitations of current AI capabilities. While some commenters, like tracker1, remain optimistic that code-assist LLMs will optimize towards supportable and legible code, others, such as maccard, argue that real progress has come from agents, tooling, and integrations rather than improved models. The discussion reveals a divide between those who believe model capability is the driving force behind advancements, like dwohnitmok, and those who think we're hitting a plateau, as PaulRobinson suggests. As eru points out, even incremental gains can be significant, leaving the community to wonder what's next for AI-assisted coding.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

Day 7

Avg / period

19.6

Comment distribution98 data points

Loading chart...

Based on 98 loaded comments

Key moments

01Story posted
Dec 18, 2025 at 3:17 PM EST
15 days ago
Step 01
02First comment
Dec 19, 2025 at 12:41 AM EST
9h after posting
Step 02
03Peak activity
52 comments in Day 7
Hottest window of the conversation
Step 03
04Latest activity
Dec 28, 2025 at 5:02 AM EST
5d ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (84 comments)

Showing 98 comments

gortok

9 days ago

2 replies

While there's not a lot of meat on the bone for this post, one section of it reflects the overall problem with the idea of Claude-as-everything:

> I spent weeks casually trying to replicate what took years to build. My inability to assess the complexity of the source material was matched by the inability of the models to understand what it was generating.

When the trough of disillusionment hits, I anticipate this will become collective wisdom, and we'll tailor LLMs to the subset of uses where they can be more helpful than hurtful. Until then, we'll try to use AI to replace in weeks what took us years to build.

samdjstephens

9 days ago

3 replies

If LLMs stopped improving today I’m sure you would be correct- as it is I think it’s very hard to predict what the future holds and where the advancements take us.

I don’t see a particularly good reason why LLMs wouldn’t be able to do most programming tasks, with the limitation being our ability to specify the problem sufficiently well.

maccard

9 days ago

3 replies

I feel like we’ve been hearing this for 4 years now. The improvements to programming (IME) haven’t come from improved models, they’ve come from agents, tooling, and environment integrations.

elAhmo

9 days ago

2 replies

Both is true, models have also been significantly improved in the last year alone, let's not even talk about 4 years ago. Agents, tooling and other sugar on top is just that - enabling more efficient and creative usage, but let's not undermine how much better models today are compared to what was available in the past.

fragmede

9 days ago

1 reply

The code that's generated when given a long leash is still crap. But damned if I didn't use a JIRA mcp and a gitlab mcp, and just have the corporate AI just "do" a couple of well defined and well scoped tickets, including interacting with JIRA to get the ticket contents, update its progress, push to gitlab, and open an MR. Then, the corporate CodeRabbit does a first pass code review against the code so any glaring errors are stomped out before a human can review it. What's more scary though is that the JIRA tickets were created by a design doc that was half AI generated in the first place. The human proposed something, the AI asked clarifying questions, then broke the project down into milestones and then tickets, and then created the epic and issues on JIRA. One of my tradie friends taking an HVAC class tells me that there are a couple of programmers in his class looking to switch careers. I don't know what the future brings, but those programmers (sorry, "software developers") may have the right idea.

llmslave2

9 days ago

Yes we get it, there is a ton of "work" being done in corporate environments, in which the slop that generative AI churns out is similar to the slop that humans churn out. Congrats.

majormajor

9 days ago

1 reply

How do you judge model improvements vs tooling improvements?

If not working at one of the big players or running your own, it appears that even the APIs these days are wrapped in layers of tooling and abstracting raw model access more than ever.

dwohnitmok

9 days ago

> even the APIs these days are wrapped in layers of tooling and abstracting raw model access more than ever.

No, the APIs for these models haven't really changed all that much since 2023. The de facto standard for the field is still the chat completions API that was released in early 2023. It is almost entirely model improvements, not tooling improvements that are driving things forward. Tooling improvements are basically entirely dependent on model improvements (if you were to stick GPT-4, Sonnet 3.5, or any other pre-2025 model in today's tooling, things would suck horribly).

dwohnitmok

9 days ago

1 reply

> The improvements to programming (IME) haven’t come from improved models, they’ve come from agents, tooling, and environment integrations.

I disagree. This almost entirely model capability increases. I've stated this elsewhere: https://news.ycombinator.com/item?id=46362342

Improved tooling/agent scaffolds, whatever, are symptoms of improved model capabilities, not the cause of better capabilities. You put a 2023-era model such as GPT-4 or even e.g. a 2024-era model such as Sonnet 3.5 in today's tooling and they would crash and burn.

The scaffolding and tooling for these models have been tried ever since GPT-3 came out in 2020 in different forms and prototypes. The only reason they're taking off in 2025 is that models are finally capable enough to use them.

kasey_junk

8 days ago

1 reply

Yet when you compare the same model in 2 different agents you can easily see capability differences. But cross (same tier) model in the same agent is much less stark.

My personal opinion is that there was a threshold earlier this year where the models got basically competent enough to be used for serious programming work. But all the major on the ground improvements since then has gone from the agents, and not all agents are equal, while all sota models are effectively.

dwohnitmok

8 days ago

1 reply

> Yet when you compare the same model in 2 different agents you can easily see capability differences.

Yes definitely. But this is to be expected. Heck take the same person and put them in two different environments and they'll have very different performance!

> But cross (same tier) model in the same agent is much less stark.

Unclear what you mean by this. I do agree that the big three companies (OpenAI, Anthropic, Google DeepMind) are all more or less neck and neck in SOTA models, but every new generation has been a leap. They just keep leaping over each other.

If you compare e.g. Opus 4.1 and Opus 4.5 in the same agent harness, Opus 4.5 is way better. If you compare Gemini 3 Pro and Gemini 2.5 Pro in the same agent harness, Gemini 3 is way better. I don't do much coding or benchmarking with OpenAI's family of models, but anecdotally have heard the same thing going from GPT-5 to GPT-5.2.

The on the ground improvements have been coming primarily from model improvements, not harness improvements (the latter is unlocked by the former). There are certain frameworks and workflows that simply did not make sense with Q2-Q3 2025 models that now make sense with Q4 2025 models.

kasey_junk

8 days ago

I actually have spent a lot of time doing comparisons between the 4.1 and 4.5 Claude models (and lately the 5.1->5.2 chatgpt models) and for many many tasks there is not significant improvement.

All things being equal I agree that the models are improving, but for many of the tasks I’m testing what has the most improvement is the agent. The agents choosing the appropriate model for the task for instance has been huge.

I do believe there is beneficial symbiosis but for my results the agent's provide much bigger variance than the model.

bigiain

9 days ago

> I feel like we’ve been hearing this for 4 years now.

I feel we were hearing very similar claims 40 years ago, about how the next version of "Fourth Generation Languages" were going to enable business people and managers to write their own software without needing pesky programmers to do it for them. They'll "just" need to learn how to specify the problem sufficiently well.

(Where "just" is used in it's "I don't understand the problem well enough to know how complicated or difficult what I'm about to say next is" sense. "Just stop buying cigarettes, smoker!", "Just eat less and exercise more, fat person!", "Just get a better paying job, poor person!", "Just cheer up, depressed person!")

PaulRobinson

9 days ago

1 reply

LLM capability improvement is hitting a plateau with recent advancements mostly relying on accessing context locally (RAG), or remotely (MCP), with a lot of extra tokens (read: drinking water and energy), being spent prompting models for "reasoning". Foundation-wise, observed improvements are incremental, not exponential.

> able to do most programming tasks, with the limitation being our ability to specify the problem sufficiently well

We've spent 80 years trying to figure that out. I'm not sure why anyone would think we're going to crack this one anytime in the next few years.

eru

9 days ago

1 reply

> Foundation-wise, observed improvements are incremental, not exponential.

Incremental gains are fine. I suspect capability of models scales roughly as the logarithm of their training effort.

> (read: drinking water and energy)

Water is not much of a concern in most of the world. And you can cool without using water, if you need to. (And it doesn't have to be drinking water anyway.)

Yes, energy is a limiting factor. But the big sink is in training. And we are still getting more energy efficient. At least to reach any given capability level; of course in total we will be spending more and more energy to reach ever higher levels.

PaulRobinson

5d ago

Incremental gains in output seem to - so far - require exponential gains in input. This is not fine.

Water is a concern in huge parts of the World, as is energy consumption.

And if the big sink is “just” in training, why is there so much money being thrown at inference capacity?

I thought it was mad when I read that Bitcoin uses more energy than the country of Austria, but knowing AI inference using more energy than all the homes in the USA is so, so, so much worse given the quality of the outputs are so mediocre.

majormajor

9 days ago

> the limitation being our ability to specify the problem sufficiently well

Such has always been the largest issue with software development projects, IMO.

tracker1

9 days ago

I would think/hope that the code assist LLMs would be optimizing towards supportable/legible code solutions overall. Mostly in that they can at least provide a jumping off point, largely accepting that they more often than not won't be able to produce complete, finished solutions entirely.

xnorswap

9 days ago

1 reply

It's amusing to think that claude might be better at generating ascii diagrams than generating code to generate diagrams, despite it being nominally better at generating code.

I'm generating a lot of PDFs* in claude, so it does ascii diagrams for those, and it's generally very good at it, but it likely has a lot of such diagrams in its training set. What it then doesn't do very well is aligning them under modification. It can one-shot the diagram, it can't update it very well.

The euphoric breakthrough into frustration of so-called vibe-coding is well recognised at this point. Sometimes you just have to step back and break the task down smaller. Sometimes you just have to wait a few months for an even better model which can now do what the previous one struggled at.

* Well, generating Typst mark-up, anyway.

rashkov

9 days ago

I just ask it to generate mermaid diagrams, which are just code that you can render using the mermaid diagram website

esafak

9 days ago

1 reply

You don't know what the model is capable of until you try. Maybe today's models are not good enough. Try again next year.

jeffrallen

9 days ago

2 replies

[delayed]

johnfn

9 days ago

1 reply

Really, you haven't found a single task they can't do? I like agents, but this seems a little unrealistic? Recently, I asked Codex and Claude both to "give me a single command to capture a performance profile while running a playwright test". Codex worked on this one for at least 2 hours and never succeeded, even though it really isn't that hard.

magicalhippo

9 days ago

[delayed]

eru

9 days ago

> I simply cannot come up with tasks the LLMs can't do, when running in agent mode, with a feedback loop available to them. Giving a clear goal, and giving the agent a way to measure it's progress towards that goal is incredibly powerful.

It's really easy to come up with plenty of algorithmic tasks that they can't do.

Like: implement an algorithm / data structure that takes a sequence of priority queue instructions (insert element, delete smallest element) in the comparison model, and return the elements that would be left in the priority queue at the end.

This is trivial to do in O(n log n). The challenge is doing this in linear time, or proving that it's not possible.

(Spoiler: it's possible, but it's far from trivial.)

simonw

9 days ago

7 replies

Funny to see this show up today since coincidentally I've had Claude code running for the past ~15 hours attempting to port MicroQuickJS to pure dependency-free Python, mainly as an experiment in how far a porting project can go but also because a sandboxed (memory constrained, to us time limits) JavaScript interpreter that runs in Python is something I really want to exist.

I'm currently torn on whether to actually release it - it's in a private GitHub repository at the moment. It's super-interesting and I think complies just fine with the MIT licenses on MicroQuickJS so I'm leaning towards yes.

yeasku

9 days ago

1 reply

I am waiting for a llm entusiast to create something like MicroQuickJS from scratch.

GaryBluto

9 days ago

2 replies

Fabrice Bellard, who developed MicroQuickJS, is a user of LLMs.

yeasku

9 days ago

1 reply

Can you point out where Fabrice states that he uses LLM to code?

I think you halucinated this up.

GaryBluto

9 days ago

1 reply

> I think you halucinated this up.

No point in responding to a troll, but for the other people who may be reading this comment chain, he's used LLMs for various tasks. Not to mention that he founded TextSynth, an entire service that revolves around them.

https://textsynth.com/

https://bellard.org/ts_sms/

yeasku

9 days ago

2 replies

That is not using a llm to code you are making shit up.

Is a compression algo, nothing related to codding.

That is the level...

simonw

9 days ago

1 reply

You're confused. The compression algorithm was something different. TextSynth is an LLM inference server, similar to (but older than) llama.cpp.

yeasku

9 days ago

Creating a llama.cpp like software is not using LLMs to develop software neither.

dwaltrip

9 days ago

> TextSynth provides access to large language, text-to-image, text-to-speech or speech-to-text models such as Mistral, Llama, Stable Diffusion, Whisper thru a REST API and a playground. They can be used for example for text completion, question answering, classification, chat, translation, image generation, speech generation, speech to text transcription, ...

???

subscribed

8 days ago

1 reply

I scratch my head trying to understand how your comment relates to the parent...

zahlman

8 days ago

The implication is that the task originally wondered about was done to begin with.

krackers

9 days ago

1 reply

You should release it, it'd be quite useful.

simonw

9 days ago

1 reply

https://pypi.org/project/micro-javascript/ - https://github.com/simonw/micro-javascript

Here's the transcript showing how I built it: https://static.simonwillison.net/static/2025/claude-code-mic...

dotancohen

9 days ago

1 reply

I see that you're no longer copying and pasting from the terminal, I remember those two gnarly code sessions to get a previous transcript. How are you generating that transcript now? I'd certainly like to use that for my own record keeping.

simonw

8 days ago

1 reply

It's a new tool I built yesterday (because for this particular JavaScript interpreter project publishing the full transcript was essential): https://github.com/simonw/claude-code-publish

It only works with Claude Code for the web sessions at the moment but I expect I'll get it working for local sessions too.

dotancohen

8 days ago

Great, thanks!

llmslave2

9 days ago

1 reply

How many tests do other JS runtimes like V8 have? ~400 tests sounds reasonable for a single data structure, but orders of magnitude off for a language runtime.

simonw

9 days ago

1 reply

MicroQuickJS has 7, kind of: https://github.com/bellard/mquickjs/tree/main/tests

Though if you look in those files some of them run a ton of test functions and assertions.

My new Python library executes copies of the tests from that mquickjs repo - but those only count as 7 of the 400+ other tests.

ruszki

9 days ago

2 replies

Were the tests generated by an AI then. How do you know whether they are really comprehensive?

dotancohen

9 days ago

1 reply

I'm now having Claude Code build the tests for my voicenotes organization application. For the most basic implementation - just a single text field - I wrote in English which tests I know I need, there were about two dozen. Approaching size limits, unicode, normalization, nonprinting characters, Hebrew vowel points, empty strings vs NULL strings, Exceeded byte length without exceeded character length, etc etc. I then threw Claude Code at it.

Claude Code found more edge cases to write tests for than I ever would have thought of. And I've been doing this for 20 years.

ruszki

8 days ago

First of all, basically nobody can know what you compare LLMs to. It’s possible that you claim AGI level consciousness, or that they are still utterly terrible. Nobody knows the baseline, ie your test writing capabilities.

Also, there is a difference between writing tests, and claiming that your product is good because the tests are green. The other day Claude simply mocked out everything from the real code of mine just to make tests good. On surface level everything was green. And no, don’t lie that people review their generated code, almost nobody did that even before LLMs, this hasn’t changed. I’ve seen enough LLM generated code, even some people who showed me how they work, to know that they don’t have a clue what’s generated. When I dug into some praised LLM code here, their quality was bad, sometimes mediocre, but mostly just bad.

simonw

8 days ago

Yes, effectively my entire project was generated by AI.

It's a very weird and uncomfortable experience at of working - I've said in the past that I don't like a single line of unreviewed AI-generated code in anything beyond a prototype, and now here I am with 13,000+ lines of mostly unreviewed Python written by Claude Opus 4.5.

I'm leaving the alpha label on it until I'm a whole lot more comfortable with the codebase!

I do however know that the tests are pretty comprehensive because I had the model use TDD from the very start - write a test, watch it fail, then implement code to make it pass.

I was able to keep an eye on what it was doing on my phone while it worked and the TDD process seemed to be staying honest.

Here's one example from the full transcript, showing how it implemented closures: https://static.simonwillison.net/static/2025/claude-code-mic...

cryptonector

9 days ago

1 reply

But why Python? Why not a JVM like Graal? I would think that would yield faster code.

Or why not run MicroQuickJS under Fil-C? It's ideal since it has not dependencies.

simonw

9 days ago

1 reply

I build and distribute software in Python. My ideal solution is something that installs cleanly via pip so I can include it as a regular dependency of my other projects.

benatkin

9 days ago

It's analogous to asm.js, the precursor to WebAssembly, which was written in js, in that it ran virtual machines in pure js, which is a huge win in portability. The mquickjs readme explains it in a much lower level way than the quickjs readme. There's also more emphasis on the bytecode. In a way it's like a tiny WebAssembly plus garbage collection extension vm that can run compile to js languages, going beyond that and compiling them to bytecode. The overhead of porting it to a dynamic language wouldn't always be that bad depending on the use case. Its memory savings could be maintained.

rasz

9 days ago

TI had similar idea with TI-99/4 - running interpreted BASIC programs using BASIC written in special interpreted language (GPL) running in its own virtual machine, with actual CPU machine code executing from ram accessible thru single byte window of Video processor. Really brilliant system, turtles all the way down.

1317

9 days ago

yt-dlp/youtube-dl used a python javascript interpreter to run youtube's JS until recently

idk how complete it is but it solved youtube's challenges etc for a long time

https://github.com/yt-dlp/yt-dlp/blob/6d92f87ddc40a319590976...

csomar

8 days ago

I wouldn't trust it without a deeper inspection. I've had Claude do a workaround (ie: use a javascript interpreter and wrap it in Python) and then claim that it completed the task! The CoT was an interesting read on how his mind think about my mind (the user want ... but this should also achieve this ... the user however asked it to be this ... but this can get what the user want ...; that kind of salad)

mrguyorama

9 days ago

2 replies

This is unfortunate. I thought porting code from one language to another was somewhere LLMs were great, but if you need expertise of the source code to know what you are doing that's only an improvement in very specific contexts: Basically just teams doing rewrites of code they already know.

Our team used claude to help port a bunch of python code to java for a critical service rewrite.

As a "skeptic", I found this to demonstrate both strengths and weaknesses of these tools.

It was pretty good at taking raw python functions and turning them into equivalent looking java methods. It was even able to "intuit" that a python list of strings called "active_set" was a list of functions that it should care about and discard other top level, unused functions. The functions had reasonable names and picked usable data types for every parameter, as the python code was untyped.

That is, uh, the extent of the good.

The bad: It didn't "one-shot" this task. The very first attempt, it generated everything, and then replaced the generated code with a "I'm sorry, I can't do that"! After trying a slightly different prompt it of course worked, but it silently dropped the code that caused the previous problem! There was a function that looked up some strings in the data, and the lookup map included swear words, and apparently real companies aren't allowed to write code that includes "shit" or "f you" or "drug", so claude will be no help writing swear filters!

It picked usable types but I don't think I know Java well enough to understand the ramifications of choosing Integer instead of integer as a parameter type. I'll have to look into it.

It always writes a bunch of utility functions. It refactored simple and direct conditionals into calls to utility functions, which might not make the code very easy to read. These utility functions are often unused or outright redundant. We have one file with like 5 different date parsing functions, and they were all wrong except for the one we quickly and hackily changed to try different date formats (because I suck so the calling service sometimes slightly changes the timestamp format). So now we have 4 broken date parsing functions and 1 working one and that will be a pain that we have to fix in the new year.

The functions look right at first glance but often had subtle errors. Other times the ported functions had parts where it just gave up and ignored things? These caused outright bugs for our rewrite. Enough to be annoying.

At first it didn't want to give me the file it generated? Also the code output window in the Copilot online interface doesn't always have all the code it generated!

It didn't help at all with the hard part: Actual engineering. I had about 8 hours and needed find a way to dispatch parameters to all 50ish of these functions and I needed to do it in a way that didn't involve rebuilding the entire dispatch infrastructure from the python code or the dispatch systems we had in the rest of the service already, and I did not succeed. I hand wrote manual calls to all the functions, filling in the parameters, which the autocomplete LLM in intellij kept trying to ruin. It would constantly put the wrong parameters places and get in my way, which was stupid.

Our use case was extremely laser focused. We were working from python functions that were designed to be self contained and fairly trivial, doing just a few simple conditionals and returning some value. Simple translation. To that end it worked well. However, we were only able to focus the tool into this use case because we already had the 8 years experience of the development and engineering of this service, and had already built out the engineering of the new service, building lots of "infrastructure" that these simple functions could be dropped into, and giving us easy tooling to debug the outcomes and logic bugs in the functions using tens of thousands of production requests, and that still wasn't enough to kill all errors.

All the times I turned to claude for help on a topic, it let me down. When I thought java reflection was wildly more complicated than it actually is, it provided the exact code I had already started writing, which was trivial. When I turned to it for profiling our spring boot app, it told me to write log statements everywhere. To be fair, that is how I ended up tracking down the slowdown I was experiencing, but that's because I'm an idiot and didn't intuit that hitting a database on the other side of the country takes a long time and I should probably not do that in local testing.

I would pay as much for this tool per year as I pay for Intellij. Unfortunately, last I looked, Jetbrains wasn't a trillion dollar business.

eru

9 days ago

> So now we have 4 broken date parsing functions and 1 working one and that will be a pain that we have to fix in the new year.

Property based testing can be really useful here.

zahlman

8 days ago

> I don't think I know Java well enough to understand the ramifications of choosing Integer instead of integer as a parameter type.

[0]

Java's `int` is a 32-bit "machine" integer (in a virtual architecture, but still stored by value with no additional space overhead). Java's `Integer` is an object with reference semantics, like[1] every value in a Python program — but unlike Python's `int`, it still has the 32-bit range restriction. If you need arbitrary-size integer values in Java, those come from `java.math.BigInteger`.

> It always writes a bunch of utility functions. It refactored simple and direct conditionals into calls to utility functions, which might not make the code very easy to read.

Are the names good, at least? I do this sort of thing and often find it helpful. Of course, that does depend on choosing one utility function for the same task and reusing it, and being sure it actually works.

> I hand wrote manual calls to all the functions, filling in the parameters, which the autocomplete LLM in intellij kept trying to ruin. It would constantly put the wrong parameters places and get in my way, which was stupid.

Yeah, Java lacks a lot of Python's nice tricks for this. (I've had those frustrations with IDEs since long before LLMs.)

> it told me to write log statements everywhere. To be fair, that is how I ended up tracking down the slowdown I was experiencing, but that's because I'm an idiot and didn't intuit that hitting a database on the other side of the country takes a long time and I should probably not do that in local testing.

It sounds like you wanted this for immediate debugging. The word "logging" does not autocomplete "to a remote server db" in my head. Sometimes it's useful to have mental defaults oriented towards what is temporary and quick rather than what is permanent and robust.

[0] Did you consider asking the LLM? It can probably deal with this question pretty well if you ask directly, although I don't know how much it would take to get from there to actually having it fix any problems. But I might as well write a human perspective since I'm here.

[1] Unlike Python, all those "objects with reference semantics" can be NULL in Java (and you need a possibly-third-party annotation to restrict that type to be non-null). There is no "null object" analogous to Python's `None`.

riffraff

9 days ago

1 reply

fun to read this in the context of the recent news that microsoft wants to port all their C/C++ to Rust in 5 years with "AI and algorithms"[0].

I'm sure the MS plan is not just asking Claude "port this code to rust: <paste>", but it's just fun to think it is :)

0: https://www.theregister.com/2025/12/24/microsoft_rust_codeba...

lolsowrong

9 days ago

Yea, but Galen’s a beast so it might actually happen.

tonnydourado

9 days ago

2 replies

I won't deny OP learned something in this process, but I can't help but wonder: if they spent the same time and effort just porting the code themselves, how much more would they have learned?

Specially considering that the output would be essentially the same: a bunch of code that doesn't work.

embedding-shape

9 days ago

1 reply

I guess it depends on well people want to know things like "Perl (and C) library to web" skills. Personally, there are languages I don't want to learn, but for one reason or another, I have to change some details in a project that happen to use that language. Sure, I could sit down and learn enough of the language so I can do the thing, but if I don't like or want to use that language, the knowledge will eventually atrophy anyways, so why bother?

20after4

9 days ago

I think the specific language in question - perl - is really the source of OP's frustration. Perl is kind of like Regular Expressions - much easier to write than it is to read. I would expect LLMs to struggle with understanding perl. It's one of the best languages for producing obfuscated code by hand. There are many subtleties and context-dependence in perl, and they aren't immediately apparent from the raw syntax.

rfw300

9 days ago

That may be true, but it does seem like OP's intent was to learn something about how LLM agents perform on complex engineering tasks, rather than learning about ASCII creation logic. A different but perhaps still worthy experiment.

dwaltrip

9 days ago

4 replies

[delayed]

aretu7888

9 days ago

1 reply

You just need to know what you are doing. In this case, the problem is not "rewriting the logic" but "mapping Perl syntax to Typescript syntax" and "mapping Perl libs to Typescript libs". In other words, you'd be better off with an old-fashioned script that merely works on syntax mangling along with careful selection of dependencies (and maybe some manual labor around fixing the APIs of the consumers).

This is easy work, made hard by the "allure" of LLMs, which go from emphatic to emetic in the blink of an eye.

dwaltrip

9 days ago

Are you sure you know what you are doing? ;)

The actual goal is to faithfully replicate the functionality and solve the same use cases with a different language.

You describing similar but different instrumental goals.

That said I agree your framing is helpful!

cjlmAuthor

9 days ago

Looking forward to seeing how you get on ;-)

itsangaris

9 days ago

> A reader (or dare I say a wiser version of me), armed with a future model and dedicated to the task, will succeed with this port where I failed and that makes me uneasy.

lomase

9 days ago

Expert vibe engineer sounds as silly as exper stackoverflow copypaster.

embedding-shape

9 days ago

3 replies

As always, the answer is "divide & conquer". Works for humans, works for LLMs. Divide the task into as small, easy to verify steps as possible, ideally steps you can automatically verify by running one command. Once done, either do it yourself or offload to LLM, if the design and task splitting is done properly, it shouldn't really matter. Task too difficult? Divide into smaller steps.

eru

9 days ago

1 reply

Well, ideally we teach the AIs how to divide-and-conquer. I don't care, whether my AI coding assistant is multiple LLMs (or other models) working together.

crazygringo

9 days ago

1 reply

They already know how to. But you have to tell them that's the way you want them to operate, tell them how to keep track of it, tell them how to determine when each step is done. You need to specify what you want both in terms of final result but also in terms of process.

The AI's are super capable now, but still need a lot of guiding towards the right workflow for the project. They're like a sports team, but you still need to be a good coach.

eru

9 days ago

> They already know how to. But you have to tell them that's the way you want them to operate, tell them how to keep track of it, tell them how to determine when each step is done. You need to specify what you want both in terms of final result but also in terms of process.

I found Google Antigravity (with the current Gemini models) to be fairly capable. If I had to guess, it seems like they set up their system to get that divide-and-conquer going. As you suggest, it's not that hard: they just have to put the instructions in their equivalent of the system prompt.

Well, when I say 'not that hard', I mean it's an engineering problem to get the system and tooling working together nicely, not really an AI problem.

lomase

9 days ago

I ask the LLM to split the task for me. It shines.

fulafel

9 days ago

Judging from this an approach might have been to port the 28 modules individually and check that everything returns the same data in Perl and TS versions:

"I took a long-overdue peek at the source codebase. Over 30,000 lines of battle-tested Perl across 28 modules. A* pathfinding for edge routing, hierarchical group rendering, port configurations for node connections, bidirectional edges, collapsing multi-edges. I hadn’t expected the sheer interwoven complexity."

delduca

9 days ago

2 replies

Claude was able to write a NES emulator for my engine from scratch, on 3rd try

https://github.com/willtobyte/NES

lomase

9 days ago

1 reply

Pssst, I can write one in 1 minute.

I use this LLM called git clone.

nacozarina

8 days ago

vibe coders hate this one weird trick

doawoo

9 days ago

Only because humans before Claude wrote many NES emulators…

zkmon

9 days ago

1 reply

You don't ship a port. You ship to a port.

zahlman

8 days ago

1 reply

In ordinary English, yes.

In software engineering, "ship" commonly means "distribute" (to a deliberately unspecified audience), while "port" commonly means "software manually translated to another programming language or adapted to another platform".

zkmon

8 days ago

How do I add a sarcasm flag at HN?

debois

9 days ago

1 reply

I recognize this part:

> I don’t recall what happened next. I think I slipped into a malaise of models. 4-way split-paned worktrees, experiments with cloud agents, competing model runs and combative prompting.

You’re trying to have the LLM solve some problem that you don’t really know how to solve yourself, and then you devolve into semi-random prompting in the hope that it’ll succeed. This approach has two problems:

1. It’s not systematic. There’s no way to tell if you’re getting any closer to success. You’re just trying to get the magic to work.

2. When you eventually give up after however many hours, you haven’t succeeded, you haven’t got anything to build on, and you haven’t learned anything. Those hours were completely wasted.

Contrast this with you beginning to do the work yourself. You might give up, but you’d understand the source code base better, perhaps the relationship between Perl and Typescript, and perhaps you’d have some basics ported over that you could build on later.

gyomu

9 days ago

3 replies

When I teach programming, some students, when stuck, will start flailing around - deleting random lines of code, changing call order, adding more functions, etc - and just hoping one of those things will “fix it” eventually.

This feels like the LLM-enabled version of this behavior (except that in the former case, students will quickly realize that what they’re doing is pointless and ask a peer or teacher for help; whereas maybe the LLM is a little too good at hijacking that and making its user feel like things are still on track).

The most important thing to teach is how to build an internal model of what is happening, identify which assumptions in your model are most likely to be faulty/improperly captured by the model, what experiments to carry out to test those assumptions…

In essence, what we call an “engineering mindset” and what good education should strive to teach.

im_down_w_otp

8 days ago

> When I teach programming, some students, when stuck, will start flailing around - deleting random lines of code, changing call order, adding more functions, etc - and just hoping one of those things will “fix it” eventually.

That sounds like a lot of people I’ve known, except they weren’t students. More like “senior engineers”.

vbezhenar

9 days ago

Can LLM ask to help? I didn't see that. They can ask to clarify something, but they'll never admit they need help.

dodos

8 days ago

I definitely fall into this trap sometimes. Oftentimes that simple order of ops swap will fix my issue, but when it doesn't, it's easy to get stuck in the "just one more change" mindset instead of taking a step back to re-assess.

akrauss

9 days ago

1 reply

It is really important that such posts exist. There is the risk that we only hear about the wild successes and never the failures. But from the failures we learn much more.

One difference between this story and the various success stories is that the latter all had comprehensive test suites as part of the source material that agents could use to gain feedback without human intervention. This doesn’t seem to exist in this case, which may simply be the deal breaker.

enraged_camel

9 days ago

1 reply

>> This doesn’t seem to exist in this case, which may simply be the deal breaker.

Perhaps, but perhaps not. The reason tests are valuable in these scenarios is they are actually a kind of system spec. LLMs can look at them to figure out how a system should (and should not) behave.

I don’t see why regular specs (e.g. markdown files) could not serve the same purpose. Of course, most GitHub projects don’t include such files, but maybe that will change as time goes on.

morcus

8 days ago

> I don’t see why regular specs (e.g. markdown files) could not serve the same purpose.

I think because they're doomed to become outdated without something actually enforcing the spec.

dotancohen

9 days ago

2 replies

> The port I couldn't ship

It turns out that having a "trainer" to "coach" you is not a coincidence: these two words evolved together from the rail industry to the gym. Do "port" and "ship" have a similar history, evolving together from the maritime industry to software?

zahlman

8 days ago

1 reply

As far as I can tell, no. The relationship isn't the same; in software, the "port" is the translated software itself, not the destination platform.

The etymological roots are quite interesting, though. We aren't quite sure where the word "ship" comes from — Etymonline hazards

> Watkins calls this a "Germanic noun of obscure origin." OED says "the ultimate etymology is uncertain." Traditionally since Pokorny it is derived from PIE root skei- "to cut, split," perhaps on the notion of a tree cut out or hollowed out, but the semantic connection is unclear. Boutkan gives it "No certain IE etymology."

The word "port" goes back to the PIE root "

per-" meaning "forward", and thus as a verb "to lead". It seems to have emerged in Latin in multiple forms: the word "portus" ("harbor"), verb "portare" (to carry or bring). I was surprised to learn that the English "ferry" does not come from the other Latin verb with the sense of carrying (the irregular "ferre"), but from Germanic and Norse words... that are still linked back to "per-".
Basically, transportation (same "port"!) has been important to civilization for a long time, and quite a bit of it was done by, well, shipping. And porting software is translating the code; the "lat" there comes from the past participle of the irregular Latin verb mentioned above, about which
> Presumably lātus was taken (by a process linguists call suppletion) from a different, pre-Latin verb. By the same process, in English, went became the past tense of go. Latin lātus is said by Watkins to be from
tlatos, from PIE root *tele- "to bear, carry" (see extol), but de Vaan says "No good etymology available."

dotancohen

8 days ago

Thank you. That was fun!

thaumasiotes

8 days ago

> It turns out that having a "trainer" to "coach" you is not a coincidence: these two words evolved together from the rail industry to the gym.

This does not appear to be true.

Train (etymonline):

> "to discipline, teach, bring to a desired state or condition by means of instruction," 1540s, which probably is extended from the earlier sense of "draw out and manipulate in order to bring to a desired form" (Middle English trainen, attested c. 1400 as "delay, tarry" on a journey, etc.); from train (n.) For the notion of "educate" from that of "draw," compare educate.

[That train (n.) doesn't refer to the rail industry, which didn't really exist in the 1540s. It refers to a succession (as one railcar will follow another in later centuries), or to the part of your clothing that might drag on the ground behind you, or to the act of dragging anything generally. Interestingly, etymonline derives this noun from a verb train meaning to drag; given the existence of this verb, I see no reason to derive the verb train in the sense "teach" from the noun derived from the same verb in the sense "drag". The entry on the verb already noted that it isn't unexpected for "drawing" [as water from a well] to evolve into "teaching".]

Coach (wiktionary):

> The meaning "instructor/trainer" is from Oxford University slang (c. 1830) for a "tutor" who "carries" one through an exam

Coach might be a metaphor from the rail industry (or the horse-and-buggy industry), but trainer isn't.

chanux

8 days ago

> A reader (or dare I say a wiser version of me), armed with a future model and dedicated to the task, will succeed with this port where I failed and that makes me uneasy.

Is that confidence of or positivity? I hope I will find out in the future, here on HN.

cryptonector

9 days ago

I wonder how well Claude would do at porting Heimdal's ASN.1 compiler to Rust, Swift, Java, etc. I wonder how well it would do at porting Heimdal's lib/hx509. I think the latter would be much easier than the former. But I'd expect that porting the krb5 code would be much harder still.

abstractspoon

15 days ago

Easy come easy go

meibo

9 days ago

Good luck to Microsoft trying to port a billion lines of C++ mazes to Rust with their bullshit machines, I'm sure they won't give up on that one after half a week

pshirshov

9 days ago

It should be doable. This WAS doable: https://github.com/7mind/jopa

View full discussion on Hacker News

ID: 46318080Type: storyLast synced: 12/25/2025, 3:30:17 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN