Claude Code on the Web

Posted3 months agoActive3 months ago

adocomplete

578 points

390 comments

anthropic.comTechstoryHigh profile

excitedmixed

Debate

60/100

AI Coding AssistantsClaude CodeSoftware Development

Key topics

AI Coding Assistants

Claude Code

Software Development

Anthropic released Claude Code on the web, a web-based interface for their AI coding assistant, sparking discussion among developers about its features, limitations, and potential impact on their workflow.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

23m

Peak period

119

0-6h

Avg / period

22.9

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Oct 20, 2025 at 2:12 PM EDT
3 months ago
Step 01
02First comment
Oct 20, 2025 at 2:35 PM EDT
23m after posting
Step 02
03Peak activity
119 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Oct 22, 2025 at 11:53 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (390 comments)

Showing 160 comments of 390

Redster

3 months ago

1 reply

Here's the link talking about the sandbox environment and features they're using for this Claude Code. https://www.anthropic.com/engineering/claude-code-sandboxing

hmokiguess

3 months ago

Soon —> https://xkcd.com/2044/

jryio

3 months ago

2 replies

Pair programming is still one of the best ways to knowledge transfer between two programmers in a high throughput manner. Humans learn by doing, building synaptic connections.

I wonder if a shared Claude Code instance has the same effect?

dingnuts

3 months ago

4 replies

The person driving is the one that learns the most in pair programming. In the scenario you've described, that would be Claude. LLMs don't learn.

Doesn't CC sometimes take twenty, thirty minutes to return an attempt? I wouldn't know, because I'm not rich and my employer has decided CC is too expensive, but I wonder what you would do with your pair programming partner while you wait.

The bosses would like to think we'd start working on something else, maybe start up a different Claude instance, but can you really change contexts and back before the first one is done? You AND your partner?

Nah, just go play air hockey until your boss realizes Claude is what they need, not you.

myko

3 months ago

1 reply

> Nah, just go play air hockey until your boss realizes Claude is what they need, not you.

This is a depressing comment.

I am apprehensive about the future of software development in this milieu. I've pumped out a ~15,000 line application heavily utilizing Claude Code over a few days that seems to work, but I don't know how much to trust it.

Certainly part of the fun of building something was missing during that project, but it was still fun to see something new come to life.

Maybe I should say I am cautiously optimistic but also concerned: I don't feel confident in the best ways to use these tools to build good software, and I'm not sure exactly what skills are useful in order to get them there.

losteric

3 months ago

1 reply

> I've pumped out a ~15,000 line application heavily utilizing Claude Code over a few days that seems to work, but I don't know how much to trust it.

Can I ask what you built?

myko

3 months ago

https://github.com/Chuntttttt/TapeDeck

There was a post recently where someone linked to: https://simplyexplained.com/blog/how-i-built-an-nfc-movie-li...

and I thought the project was amazing, but I didn't like how the IDs were managed in yml, so I built this to make it more dynamic. I plan to add support for other smart home automations with it as well as more streaming services.

One of the features I really like about it is it makes it easy to print and cut out stickers to slap on the NFC cards for playing media.

My toddler loves it so far and one of his friend's has asked me to make one for him as well

astrange

3 months ago

You can get plenty of CC on a $20/month plan.

mr_mitm

3 months ago

Just for the record, CC is about the cost of a Netflix subscription, and it responds faster than any human can.

lazerwalker

3 months ago

I have criticisms of both tools like Claude Code and how applicable the 'pair programming' metaphor is here, but strong disagree that the person driving during pairing is the one who learns the most (or perhaps the implied "and the non-driver doesn't learn enough"). A good dynamic pairing session is equally valuable for both participants, even if there's a skill gap, and even if you're not alternating drivers as often as you should.

dgunay

3 months ago

I haven't seen anyone talking about using agents in this way. I wonder if it would be helpful for learning e.g. a new language or codebase to have the human write the code while the agent takes the role of the "backseat driver" in the pair programming dynamic.

charlesabarnes

3 months ago

6 replies

It's pretty frustrating that every release is IOS first without any timeline or expectation for Android

outime

3 months ago

2 replies

This may explain it: https://9to5mac.com/2023/09/06/iphone-users-spend-apps/

poly2it

3 months ago

It is also relevant to know if a user who'd otherwise use app X on iOS would use X less on Android.

rldjbpin

3 months ago

this makes sense for apps where you pay up front to use it. for subscription-based services, this idea falls flat.

in fact, apple made it harder for apps to take payments from its users in the past than others.

pjmlp

3 months ago

1 reply

It is basically a US centric view of mobile OS market share.

aaronbrethorst

3 months ago

2 replies

Anthropic is a US-based company.

pjmlp

3 months ago

Some companies would rather have a more international user base.

https://gs.statcounter.com/os-market-share/mobile/worldwide

So maybe they rather please the home market, I guess.

OJFord

3 months ago

With a global market and extant user base.

bahmboo

3 months ago

1 reply

Anthropic and Apple have a strategic partnership. It's a bit dicey but still seems to be in play. Which is interesting considering Google is a major investor and Apple is not. Anthropic wants Apple as a paying customer. Apple wants them to bend the knee.

lvl155

3 months ago

1 reply

Apple also has relationship with OAI. They’re not preferential.

bahmboo

3 months ago

Yes but the question was why Anthropic is showing more attention to iOS vs Android.

alwillis

3 months ago

1 reply

Not unusual; most high profile apps ship on iOS first, going back to Instagram [1], which was released October 10, 2010. Instagram shipped their Android version 1.5 years later.

[1]: https://www.techtarget.com/searchcio/definition/Instagram

spondyl

3 months ago

2 replies

Another, not incompatible explanation is that it's also just easier to develop for a handful of known iOS/iPadOS targets compared to Android's unbounded set of screen sizes and device specs.

wahnfrieden

3 months ago

2 replies

If your app runs on iPadOS, you already need to support every "screen size" (window size)

Android is simply a much worse platform to make money on. Users spend <25% as much as iOS users. Why would they prioritize that?

djmips

3 months ago

2 replies

In practice Android is much more difficult to handle the myriad of offerings - Have you ever tried both? To your other point, what app spend would Anthropic be worried about - they have a subscription model.

mh-

3 months ago

Anthropic supports in-app purchases for Claude subscriptions, at least in the US.

wahnfrieden

3 months ago

They sell through the app, too. And Android users are just as unlikely to spend outside of apps as they are inside them. Android deprioritization is a business decision, not a technical complexity decision.

pjmlp

3 months ago

2 replies

Because 70% of the mobile phone world runs on Android.

It is like trying to make a living selling games to macOS users.

wahnfrieden

3 months ago

Users spend <25% as much as iOS users, and less than half in total despite larger user counts (having double the users who spend <25% each does not add up!), a gap that widens year over year. Why would they prioritize that?

Why would they care about prioritizing users who spend much less? Android pays <25% per user. You need a LOT more than 70% to make that worth prioritizing. Those users are just going to eat up free tier resources without paying. It's borderline parasitic from a business perspective.

Android users are more likely to be useful for spreading word-of-mouth reputation to Apple platform users, than they are as direct spenders. Just another reason to ensure Apple platform features don't trail Android.

bapak

3 months ago

Please take a look at the percentage of paying Android users. It just does not compare. It's useless to count 2 billion users in third world countries who never have and never will pay anything in-app.

pjmlp

3 months ago

iOS/iPadOS aren't exactly the same, without bothering to count, there are about 10 screen sizes to account for, and Apple contrary to Android world, doesn't have somethine like JetPack, either the user updates their phone or there are no new features for the apps to rely on.

wahnfrieden

3 months ago

1 reply

Android is a tiny market

OJFord

3 months ago

2 replies

You probably mean 'in the US', where iOS is 58%. Android has a 71% global market share.

wahnfrieden

3 months ago

No. Why do user counts matter? High user count but with >4x thriftiness / aversion to spending is not an attractive market over iOS.

Globally in dollars spent, not human heads. iOS is over 2x larger than Android globally, and the gap is widening year over year.

iOS spending growth outpaces Android, which even shrunk during covid while iOS spending continued to grow

https://api.backlinko.com/app/uploads/2024/03/iphone-vs-andr...

Anthropic makes money off product sales, not ad revenue, so wallets count more than eyes for this. Free users who are less than 25% as likely to spend are a burden not to be prioritized for a product business with free tier access. They need to spend much more to get a paying user on Android.

If Android were the bigger market, they'd prioritize it

bdcravens

3 months ago

Yes, if all you consider are the number of devices in use. However once you segment by devices with performance to run a given app and financial demographics that match your target customer, the numbers change.

richardw

3 months ago

It’s much harder dealing with all the complexities of different devices, screen sizes, OS versions.

https://www.reddit.com/r/applesucks/comments/1k6m2fi/why_do_...

brynary

3 months ago

3 replies

The most interesting parts of this to me are somewhat buried:

- Claude Code has been added to iOS

- Claude Code on the Web allows for seamless switching to Claude Code CLI

- They have open sourced an OS-native sandboxing system which limits file system and network access _without_ needing containers

However, I find the emphasis on limiting the outbound network access somewhat puzzling because the allowlists invariably include domains like gist.github.com and dozens of others which act effectively as public CMS’es and would still permit exfiltration with just a bit of extra effort.

minimaxir

3 months ago

2 replies

Link to the GitHub for the native sandboxing: https://github.com/anthropic-experimental/sandbox-runtime

navanchauhan

3 months ago

1 reply

I used `sandbox-exec` previously before moving to a better solution (done right, sandboxing on macOS can be more powerful than Linux imo). The way `sandbox-exec` works is that all child processes inherit the same restrictions. For example, if you run `sandbox-exec $rules claude --dangerously-skip-permissions`, any commands executed by Claude through a shell will also be bound by those same rules. Since the sandbox settings are applied globally, you currently can’t grant or deny granular read/write permissions to specific tools.

Using a proxy through the `HTTP_PROXY` or `HTTPS_PROXY` environment variables has its own issues. It relies on the application respecting those variables—if it doesn’t, the connection will simply fail. Sure, in this case since all other network connection requests are dropped you are somewhat protected but then an application that doesn't respect them will just not work

You can also have some fun with `DYLD_INSERT_LIBRARIES`, but that often requires creating shims to make it work with codesigned binaries

joshdev

3 months ago

1 reply

What is the better solution you’ve moved on to?

navanchauhan

3 months ago

Endpoint Security Extension and Network Extension

kylehotchkiss

3 months ago

Could this be used for Xcode-server? I dont like how it has access to full host filesystem

fragmede

3 months ago

1 reply

Exfiltration is always going to be possible, the question is, is it difficult enough for an attacker to succeed against the defenses I've put in place. The problem is, I really want to share, and help protect others, but if I write it up somewhere anybody can read, it's gonna end up in the training data.

koolala

3 months ago

The attacker being an LLM where all humans have to be careful what they say publicly online is a fun vector.

merrvk

3 months ago

Nice its in the app, trying it out, seems damn buggy at the moment.

jannniii

3 months ago

1 reply

I’m wondering if it would be possible to use the new skills feature or agents with this. Without the agents or the skills, I don’t know how useful this would be.

simonw

3 months ago

It's running Claude Code CLI on a container for you, so skills should just work. I've not tried them myself yet though.

ea016

3 months ago

2 replies

No relations to them, but I've started using Happy[0]'s iOS app to start and continue Claude Code sessions on my iPhone. It allows me to run sessions on a custom environment, like a machine with a GPU to train models

[0] https://github.com/slopus/happy/

hmokiguess

3 months ago

1 reply

This seems to be the only solution still if using bedrock or direct API access instead of Pro / Max plan, the Claude Code for Web doesn't seem to let you use it that way.

didgeoridoo

3 months ago

1 reply

You can log in to your CC instance however you like, including via Pro/Max. Happy just wraps it and provides remote access with a much better UI than using a phone-based terminal app.

hmokiguess

3 months ago

Yes, that's precisely what I meant! I was talking with regards to the parent article about Claude Code on the Web via Anthropic.

TechDebtDevin

3 months ago

Are you people just lighting money on fire? What could you possibly get done via a phone that is meaningful.

ubj

3 months ago

2 replies

Very curious to see what usage limits are like for paid plans. Anthropic was already experiencing issues with high-volume model usage for Pro and Max users. I hope their infrastructure is able to adequately support running these additional coding environments on top of model inference.

Just to be clear, I'm excited for the capability to use Claude Code entirely within the browser. However, I've heard reports of Max users experiencing throttled usage limits in recent months, and am concerned as to whether this will exacerbate that issue or not.

minimaxir

3 months ago

1 reply

I suspect the release of Claude Haiku 4.5 was done to help reduce usage costs for Anthropic and any use of Claude Code will differ to it if capacity is limited.

EDIT: I had meant defer which is the first time I've made a /r/boneappletea in awhile

chrisweekly

3 months ago

1 reply

"differ"? did you mean "default"?

scubbo

3 months ago

I imagine "defer"

CharlesW

3 months ago

Anecdotally, as a Max user typically using Claude Code for >8 hours/day, I've never experienced that. That said, I'm not one of those people using Opus for everything, and in fact I've been happy using Sonnet 4.5 even for planning.

cube2222

3 months ago

4 replies

This is quite nice!

I'm using Claude Code locally a lot, occasionally with a couple parallel session.

I was very happy when they made the GitHub Action - I used it quite a bit, but in practice I got frustrated that I effectively only get a single back-and-forth out of it, I can't really "continue the conversation without losing context" - Sure, I can respond to it in the PR it makes, but that will be a fresh session with a fresh empty context.

So, as much as I don't like moving out of my standard development workflow with my tools, I think this could be quite useful. The ability to interrupt and/or continue a conversation should be very nice.

My main worry is - usually my unit tests and integration tests rely on a postgres database running on the machine, and it's not obvious to me if I can spin that up here?

GreekPete

3 months ago

1 reply

https://docs.github.com/en/actions/tutorials/use-containeriz...

cube2222

3 months ago

I'm not sure how this applies? We're talking about the "Claude Code on the Web" custom sandbox, not running Claude Code in GitHub Actions.

radial_symmetry

3 months ago

Check out Crystal if you want a good tool for managing parallel sessions locally https://github.com/stravu/crystal

anon3459

3 months ago

Use pglite

kofman

3 months ago

You can ask Claude to install Postgres and it should just work. We'll have it. In the default image shortly.

fny

3 months ago

1 reply

I've been using Happy Coder[0] for some time now on web and mobile. I run it `--yolo` mode on an isolated VM across multiple projects.

With Happy, I managed to turn one of these Claude Code instances into a replacement for Claude that has all the MCP goodness I could ever want and more.

[0]: https://happy.engineering/

ShipEveryWeek

3 months ago

This looks nice! I’ve been using terminus + tailscale to get similar results, but I’ll give this a go

lvl155

3 months ago

1 reply

I am not a big fan of these. They’re trying to bundle compute and jack up the prices down the road.

jimmydoe

3 months ago

not sure why you got downvoted but adding all these bells and whistles are the way to increase the price, which in turn justify the huge investment.

without 200% price increase in 3 years, there's no way any of these AI companies will survive.

simonw

3 months ago

4 replies

I had a preview of this over the weekend, notes here plus some example PRs: https://simonwillison.net/2025/Oct/20/claude-code-for-web/

It's really solid. It's effectively a web (and native mobile) UI over Claude Code CLI, more specifically "claude --dangerously-skip-permissions".

Anthropic have recognized that Claude Code where you don't have to approve every step is massively more productive and interesting than the default, so it's worth investing a lot of resources in sandboxing.

extr

3 months ago

4 replies

It’s interesting because I’ve slowly arrived at the opposite conclusion: for much of my practical day to day work, using CC with “allow edits” turned OFF results in a much better end product. I can correct it inline, I pseudo-review the code as it’s produced, etc etc. Codex is better for “fire and forget” features for sure. But Claude remains excellent at grokking intent for problems where you aren’t quite sure what you want to build yet or are highly opinionated. Mostly due to the fact it’s faster and the iteration loop is faster.

simonw

3 months ago

2 replies

That approach should work well for projects where you are directly working on the code in tandem with Claude, but a lot of my own uses are much more research oriented. I like sending Claude Code off on a mission figure out how to do something.

Here's an example from this morning, getting CUDA working on a NVIDIA Spark: https://simonwillison.net/2025/Oct/20/deepseek-ocr-claude-co...

I have a few more in https://github.com/simonw/research

fragmede

3 months ago

1 reply

so hey by the way, have you discovered Wispr Flow or something similar so you can talk to your computer like Scotty does?

simonw

3 months ago

2 replies

Yeah I've tried it a bit it's not a habit for me yet.

I write code on my phone a lot using ChatGPT voice mode though!

conesus

3 months ago

1 reply

Here's what I did to make voice (now WisprFlow, before Superwhisper) a habit:

  1. Install Karabiner-Elements, a free macOS keyboard remapper[0]
  2. Map F19 -> F5 (mic button) in Karabiner-Elements
  3. Choose F19 as the voice hotkey in your voice app

And now you can use the handy F5 mic button on your Apple keyboard. WisprFlow automatically has it set for:

  - press and hold to talk
  - double tap for indeterminate listening until you f5/esc

That workflow alone, of using the f5 key and switching between the two modes of speaking (holding or double-tap), has freed up a not insignificant part of my working memory. Turning abstract thoughts into text is higher cost than turning them into voice.

I predict individual offices[1] will be more popular as a choice for startups.

[0]: https://karabiner-elements.pqrs.org

[1]: https://queue.acm.org/detail.cfm?id=1281887

fragmede

3 months ago

fwiw, I use the fn/international key at the bottom left of the keyboard. it's easier to locate and I (a privilege I enjoy because I rarely use diacritics) barely use it for anything else.

jcjmcclean

3 months ago

I also use voice mode a lot, I find it's really useful for talking to while you're shaping an idea or an approach, then asking it to summarise the decisions you've made. Essentially rubber ducking.

extr

3 months ago

Very fair. Interesting how much feedback on models/tools is different right now depending on what you're doing.

dbbk

3 months ago

Personally I just prefer setting it to TDD. If the test cases are what I want, and the code passes the tests, all's good.

vidarh

3 months ago

It slows it down far too much for me. What I've found after swithcing to --dangerously-skip-permissions is that while the intermediate work product is often total junk, when I then start writing a message to tell Claude to switch approach, a large proportion of the time it has figured that out by itself before I'm finished writing the message.

So increasingly I let it run, and then review when it stops, and then I give it a proper review, and let it run until it stops again. It wastes far less of my time, and finishes new code much faster. At least for the things I've made it do.

ryoshu

3 months ago

Agreed. I use CC a lot for exploratory work. It's great with fast iteration for throwaway code.

username223

3 months ago

1 reply

Do you have a practical sense of the level of mischief possible in the sandbox? It seems like a game of regexp whack-a-mole to me, which seems like a predictable recipe for decades of security problems. Allow- and deny-lists for files and domains seem about as secure as backslash-escaping user input before passing it to the shell.

simonw

3 months ago

1 reply

If you configure it with the "no network access" environment there's nothing bad that can happen. Worst is you end up wasting a bunch of CPU cycles in a container somewhere in Anthropic's infrastructure.

Their "restricted network access" setting looks questionable to me - it allow-lists a LOT of stuff: https://docs.claude.com/en/docs/claude-code/claude-code-on-t...

If you configure your own allow-list you can restrict to just domains that you trust - which is enforced by a separate HTTP/HTTPS proxy, described here: https://docs.claude.com/en/docs/claude-code/claude-code-on-t...

adastra22

3 months ago

1 reply

How do you run a remote LLM with no network access?

simonw

3 months ago

OpenAI Codex, Claude Code for web and Gemini Jules have all managed that.

You use firewalls to prevent code running inside the container from opening network connections to anywhere else. The harness that surrounds it can still be made accessible via the network.

state_less

3 months ago

> it's worth investing a lot of resources in sandboxing.

I tend to agree. There’s an opportunity to make it easy to have Claude be able to test out workflows/software within Debian, RPM, Windows, etc… container and VM sandboxes. This could be helpful for users that want to release code on multiple platforms and help their own training and testing, which they seem to be heavily invested in given all the “How Am I doing?” prompts we’re getting.

cyrusradfar

3 months ago

great points @simonw - I, incredibly, haven't ever tried --dangerously-skip-permissions yet for any "real" projects. I generally find that it stops itself for good reason.

mmaunder

3 months ago

7 replies

We were heavy users of Claude Code ($70K+ spend per year) and have almost completely switched to codex CLI. I'm doing massive lifts with it on software that would never before have been feasible for me personally, or any team I've ever run. I'll use Claude Code maybe once every two weeks as a second set of eyes to inspect code and document a bug, with mixed success. But my experience has been that initially Claude Code was amazing and a "just take my frikkin money" product. Then Codex overtook CC and is much better at longer runs on hard problems. I've seen Claude Code literally just give up on a hard problem and tell me to buy something off the shelf. Whereas Codex's ability to profoundly increase the capabilities of a software org is a secret that's slowly getting out.

I don't have any relationship with any AI company, and honestly I was rooting for Anthropic, but Codex CLI is just way way better.

Also Codex CLI is cheaper than Claude Code.

I think Anthropic are going to have to somehow leapfrog OpenAI to regain the position they were in around June of this year. But right now they're being handed their hat.

maherbeg

3 months ago

2 replies

Yeah this has been my experience as well. The Claude Code UI is still so much better, and the permissioning policy system is much better. Though I'm working on closing that gap by writing a custom policy https://github.com/openai/codex/blob/main/codex-rs/execpolic...

Kinda sick of Codex asking for approval to run tests for each test instance

rtfeldman

3 months ago

1 reply

You don't have to use Codex in its terminal UI - e.g. you can use it in the Zed IDE out-the-box:

https://zed.dev/blog/codex-is-live-in-zed

PantaloonFlames

3 months ago

And also in emacs or neovim

https://xenodium.com/introducing-acpel

mmaunder

3 months ago

1 reply

Ah the tension between cybersecurity best practices and productivity is brutal right now.

maherbeg

3 months ago

1 reply

lol yeah, but mostly just want to allow more types of reads for getting context, and primarily for test running / linting etc. I shouldn't have to approve every invocation of `pytest` or `bazel test`.

fragmede

3 months ago

1 reply

--dangerously-bypass-approvals-and-sandbox isn't enough for you?

maherbeg

3 months ago

1 reply

I don't want unlimited writes. I basically want to unlock nearly everything but approve writes in some scenarios.

fragmede

3 months ago

Where do unix permissions and a different user and extended attributes fall short for that?

durron

3 months ago

4 replies

Do you find this to still be true with the Sonnet 4.5 model?

extr

3 months ago

1 reply

IMO Sonnet 4.5 is great but it just isn’t as comprehensive of a thinker. I love Anthropic and primarily use CC day to day but for any tricky problems or “high stakes, this must not have bugs” issues, I turn to Codex. I do find if you let Codex run on it its own too long it will produce comparably sloppy or lacking-in-vision type issues that people criticize Sonnet for, however.

PantaloonFlames

3 months ago

4 replies

That’s a curious approach. Why would you use both? Why not just use the more reliable dependable option for all purposes?

extr

3 months ago

1 reply

Sonnet 4.5/CC is faster, more direct, and is generally better at following my intent rather than the letter of my prompt. A large chunk of my tasks are not "solve this concurrency bug" or "write this entire feature" but rather "CLI ops", merging commits, running a linter, deploying a service, etc. I almost use it like it was my shell.

Also while not quite as smart, it's a better pair programmer. If I'm feeling out a new feature and am not sure how exactly it should work yet, I prefer to work with Sonnet 4.5 on it. It typically gives me more practical and realistic suggestions for my codebase. I've noticed that GPT-5 can jump right into very sophisticated solutions that, while correct, are probably not appropriate.

Sonnet 4.5: "Why don't we just poll at an interval with exponential backoff?"

GPT-5: "The correct solution is to include the data in the event stream...let us begin by refactoring the event system to support this..."

That said, if I do want to refactor the event system, I definitely want to use Codex for that.

deaux

3 months ago

1 reply

Strangely enough this is one of the first times here I see someone with the exact same experience. GPT-5 is very prone to a style that would for most codebases be overengineering. I think as a large part of HN works on huge enterprise FAANG-like code, this is where it shines, so here it gets rave reviews of just being the best overall. But globally, for most developers, it's overengineering and adds a lot of unnecessary code to maintain. Sonnet in that sense remains "every man's coder". I've gone back from 4.5 to 4 now, having spent a good chunk of time with 4.5 it just seems like a slight overall regression with no real upsides besides being a little faster than 4.

extr

3 months ago

Glad I'm not crazy, the tide right now of codex > sonnet is overwhelming. Frankly I think what most people go by is "does the code work" - codex is admittedly relentless. It's very good at producing code that works. But "does it work" is not the end-all-be-all in most cases...

wrs

3 months ago

In my experience, there isn’t a model that is more dependable for all purposes. They each have some unique strengths.

NiloCK

3 months ago

Even inside the claude-code ecosystem, more than ever there are tradeoffs on raw speed vs intelligence vs cost.

Moving a bunch of verbose templated HTML around while watching results on a devserver? Haiku all day. It's a bonus that it's cheaper, but the real treat is its speed.

Adding a feature whose planning will involve intake of several files? Sonnet.

Working specifically on 'copy' or taste issues? Still I tend to prefer Opus here.

Individual experiences may vary!

macNchz

3 months ago

I frequently have multiple coding assistants going at once—Gemini 2.5 Pro via Aider as the workhorse for most standard changes, Sonnet 4.5 via Claude Code for question answering, documentation, test case development, or broad based changes to many files in a project, then GPT-5 for more complex diagnostic or architectural type things—I don’t generally like the code it writes, but it will often be able to fix situations where the other models get stuck in some kind of local maxima.

theshrike79

3 months ago

1 reply

I'm like 80% sure Sonnet 4.5 is just rebranded Opus.

Sonnet 4 was a coding companion, I could see what it was doing and it did what I asked.

Sonnet 4.5 is like Opus, it generates massive amounts of "helper scripts" and "bootstrap scripts" and all kinds of useless markdown documentation files even for the tinies PoC scripts.

deaux

3 months ago

1 reply

It's very much not, so I'm more than happy to take that bet - how much are we wagering? Have you ever used each for non-coding tasks?

The generation of helper, markdown and bootstrap scripts are very dependent on your harness.

theshrike79

3 months ago

I paid for "Claude Code", I'm not asking it for stuff about the Mesopotamian empire :)

mmaunder

3 months ago

1 reply

Yes. Sadly. And it really does make me sad. I was rooting for Anthropic. Still kinda am.

bgirard

3 months ago

1 reply

I have a very similar experience. I was heavily invested in Anthropic/Claude Code, and even after Sonnet 4.5, I'm finding that Codex is performing much better for my game development project.

mmaunder

3 months ago

It seems particularly good at high performance programming in low level languages.

esafak

3 months ago

I don't. Sonnet is faster too.

mi_lk

3 months ago

1 reply

What model are you using respectively? Not sure I share your observations

mmaunder

3 months ago

Have tried all and continue to eval regularly. I spend up to 14 hours a day. Currently recovering from a herniated disk because I spent 6 weeks sitting at a dining room table, 14 hours a day, leaning foward. Don't do that. lol. So my coverage is pretty good. I'm using GPT5-codex-high for 99% of my work. Also I have a team of 40 folks, about a third of which are software engineers and the other third are cybersecurity analysts, so I get feedback from them too and we go deep on our engineering calls re the latest learnings and capabilities.

bcrosby95

3 months ago

1 reply

Yeah, after correcting it several times I've gotten Claude Code to tell me it didn't have the expertise to work in one of my problem domains. It was kinda surprising but also kinda refreshing that it knew when to give up. For better or worse I haven't noticed similar things with Codex.

mmaunder

3 months ago

3 replies

I've chosen problems with non-negotiable outcomes. In other words, problem domains where you either are able to clearly accomplish the very hard thing, or not, and there's no grey area. I've purposely chosen these kinds of problems to prove what AI agents are capable of, so that there is no debate in my mind. And with Codex I've accomplished the previously impossible. Unambiguously. Codex did this. Claude gave up.

It's as if there are two vendors saying they can give up incredibly superpowers for an affordable price, and only one of them actually delivers the full package. The other vendor's powers only work on Tuesdays, and when you're lucky. With that situation, in an environment as competitive as things currently stand, and given the trajectory we're on, Claude is an absolute non-starter for me. Without question.

skybrian

3 months ago

We need product reviewers who can demonstrate things like this in public. Without details, "it works for me on my projects" only goes so far.

corndoge

3 months ago

Can you expound a bit on the problem domains? I am curious

Aeolun

3 months ago

I don’t think Claude is actually incapable, you just spend a lot of time telling it to yes, please actually do the difficult thing. Do not give up halfway through.

Codex says “This is a lot of work, let me plan really well.”

Claude says “This is a lot of work, let me step back and do something completely different that you didn’t ask for.”

cesarvarela

3 months ago

6 replies

Can you share an example of the tasks you found Codex being much better? From my experience Claude Code is much better.

mmaunder

3 months ago

3 replies

I can not. We're all racing very hard to take full advantage of these new capabilities before they go mainstream. And to be honest, sharing problem domains that are particularly attractive would be sharing too much. Go forth and experiment. Have fun with it. You'll figure it out pretty fast. You can read my other post here about the kinds of problem spaces I'm looking at.

mmaunder

3 months ago

6 replies

I'm seeing the downvotes. I'm sorry folks feel that way. I'm regretting my honesty.

Edit: I'd like to reply to this comment in particular but can't in a threaded reply, so will do that here: "Ah, super secret problem domains that have been thoroughly represented in the LLM training data. Nice."

This exhibits a fundamental misunderstanding of why coding agents powered by LLMs are such a game changer.

The assumption this poster is making is that LLMs are regurgitating whole cloth after being trained on whole cloth.

This is a common mistake among lay people and non-practitioners. The reality is that LLMs have gained the ability to program, by learning from the code of others. Much like a human would learn from the code of others, and then be able to create a completely novel application.

The difference between a human programmer an an agentic coder is that the agent has much broader and deeper expertise across more programming languages, and understands more design patterns, more operating systems, more about programming history, etc etc and it uses all this knowledge to fulfill the task you've set it to. That's not possible for any single human.

It's important for the poster to take two realities on board: Firstly, agentic coding agents are not regurgitating whole cloth from whole cloth. Instead they are weaving new creations because they have learned how to program. Secondly, agentic coding agents have broader and deeper knowledge than any human that will ever exist, and they never tire, and their mood and energy level never changes. In fact that improves on a continuous basis as the months go by and progress continues. This means we can, as individual practitioners or fast moving teams, create things that were never before possible for us without raising huge amounts of money and hiring large very expensive teams, and then having the overhead of lining everyone up behind a goal AND dealing with the human issues that arise, including communication overhead.

This is a very exciting time. Especially if you're curious, energetic, and are willing to suspend disbelief to go and take a look.

zamadatix

3 months ago

2 replies

Never hold regret for having honesty, it tends to lose its value completely if you only care about it when you have good news to deliver. If for anything, hold regret for when you didn't have something better appreciated to be honest about.

The easier threading-focused approach to the conversation might be to add the additional comment as an edit at the end of the original and reply to the child https://news.ycombinator.com/item?id=45649068 directly. Of course, I've broken the ability to do that by responding to you now about it ;).

mmaunder

3 months ago

1 reply

Thanks. I wasn't able to reply in a thread earlier - I guess HN has a throttle on that. So I edited the comment above to add a few more thoughts. It's a very exciting time to be alive.

jamiek88

3 months ago

Just click on the time. Where yours says ‘two hours ago’ now, if you click on that you can reply directly to any sub comment in a thread.

mmaunder

3 months ago

lol, thanks.

kobe_bryant

3 months ago

1 reply

this is absurd. no one needs or wants your AI generated answer that's a whole lot of nothing

mmaunder

3 months ago

Comments like this reveal the magnitude of polarization around this issue in tech circles. Most people actually feel this kind of animosity towards AI, and so having comment threads like this even be visible on HN is unusual. Needless to say, all my comments here are hand written. But the poster knows that, of course.

Vegenoid

3 months ago

2 replies

We've all been hearing from people talking about how amazing AI coding agents are for a while now. Many skeptics have tried them out, looked into how to make good use of them, used modern agentic tools, done context engineering, etc. and found that they did not live up to the claims being made, at least for their problem domain.

Talk is cheap, and we're tired of hearing people tell us how it's enabling them to make incredible software without actually demonstrating it. Your words might be true, or they might be just another over-exaggeration to throw on the pile. Without details we have no way of knowing, and so many make the empirically supported choice.

com2kid

3 months ago

I just had Claude code convert all my personal projects over to be dockerized, and then setup the deployment infra and scripts for everything, and finally move my server off of the nightmare nginx config file I was using.

chaboud

3 months ago

I agree. It’s pretty easy to put-up or shut up.

I recently vibe coded a video analysis pipeline with some related arduino-driven machine control. It was work to prototype an experience on some 3D printed hardware I’ve been skunking out.

By describing the pipeline and filters clearly, I had the analysis system generating useful JSON in an hour or so, including machine control simulation, all while watching TV and answering emails/slacks. Notable misses were that the JSON fields were inconsistent, and the python venvs were inconsistent for the piped way that I wanted the system to operate with.

Small fixes.

Then I wired up the hardware, and the thing absolutely crapped itself, swapping libraries, trying major structural changes, and creating two whole new copies of the machine control host code (asking me each time along the way). This went on for more than three hours, with me debugging the mess for about 20 minutes before resorting to 1) ChatGPT, which didn’t help, followed by 2) a few minutes of good old fashioned googling on serial port behavior on Mac, which, with an old sitting on the shelf Uno R3, meant that I needed to use the cu.* ports instead of tty.*, something that Claude Code had buried deeply in a tangle of files.

Curious about the failure, I told Claude Code to stop being an idiot and use a web browser to go research the problem of specifically locking up on the open operation. 30 seconds later, and with some reflective swearing from Opus 4.1, which I appreciate, I had the code I should have had 3 hours prior (along with other garbage code to clean up).

For my areas of sensing, computer vision, machine learning, etc., these systems are amazingly helpful if the algorithms can be completely and clearly described (e.g., Kalman filter to IoU, box blur followed by subsampling followed by split exponential filtering, etc.).

Attempts to let the robots work complex pipelines out for themselves haven’t gone as well for me.

johnfn

3 months ago

You’re getting downvoted because the amount of weight I place on your original comment is contingent on whether or not you’re actually using AI to do meaningful work ot not. Without clarifying what you’re doing, it’s impossible to distinguish you from one of those guys that says he’s using AI to do tons of work and then you peek under the hood and he’s made like 15 markdown files and his code is a mess that doesn’t do anything.

Well, that, and it’s just a bit annoying to claim that you’ve found some amazing new secret but that you refuse to share what the secret is. It doesn’t contribute to an interesting discussion whatsoever.

preommr

3 months ago

> I'm seeing the downvotes. I'm sorry folks feel that way. I'm regretting my honesty.

What honesty? We're not at the point of "the Godfather was a good/bad movie", we're at "no, trust, there's a really good movie called the Godfather".

Your honesty means nothing for an issue that isn't about taste or mostly subjectivness. How useful AI is and in what way is a technical discussion where the meat of the subject matter is. You've shared nothing on that front. I am not saying you have to, but like obviously people are going to downvote you - not because they might agree/disagree but because it's contributed nothing different from every other ai-hype man selling a course or something.

nik_0_0

3 months ago

I don't have any particular horse in this race, but looking at this exchange, I hope its clear where the issue is coming from.

The original post states "I am seeing Codex do much better than Claude Code", and when asked for examples, you have replied with "I don't have time to give you examples, go do it yourself, its obvious."

That is clearly going to rub folks (anyone) the wrong way. This refrain ("Wheres the data?") pops up frequently on HN, if its so obvious, giving 1 prompt where Codex is much greater than Claude doesn't seem like a heavy lift.

In absence of such an example, or any data, folks have nothing to go on but skepticism. Replying with such a polarizing comment is bound to set folks off further.

aprilthird2021

3 months ago

Why would you even comment that Codex CLI is potentially worth switching an enormous amount of spend over ($70k) and give literally 0 evidence of why it's better? That's all you've got? "Trust me bro"?

deadbabe

3 months ago

Ah, super secret problem domains that have been thoroughly represented in the LLM training data. Nice.

mordymoop

3 months ago

2 replies

I'm on the same page here. I have seen this sentiment about Codex suddenly being good a few times now, so I booted Codex CLI thinking-high back up after a break and asked it to look for bugs. It promptly found five bugs that didn't actually exist. It was the kind of truly impressively stupid mistake that I haven't seen Claude Code make essentially ever, and made me wonder if this isn't the sort of thing that's making people downplay the power of LLMs for agentic coding.

throwaway-0001

3 months ago

In my case codex fixed a bug in one shot. Took 10 min to debug and find it.

Claude struggled long time and still didn’t find.

stavros

3 months ago

I asked Sonnet 4.5 to find bugs in the code, it found five high-impact bugs that, when I prompted it a second time, it admitted weren't actually bugs. It's definitely not just Codex.

simplify

3 months ago

1 reply

Same here. I tried codex a few days ago for a very simple task (remove any references of X within this long text string) and it fumbled it pretty hard. Very strange.

fragmede

3 months ago

2 replies

yeah I'm in the same boat. Codex can't do this one task, and constantly forgets what I've told it, and I'm reading these comments saying how is so great to the point that I'm wondering if I'm the one taking the crazy pills. Maybe we're being A/B tested and don't know about it?

hattmall

3 months ago

1 reply

No, no one that's super boosting the LLMs ever tells you what they are working on or give any reasonable specifics about how and why it's beneficial. When someone does, it's a fairly narrow scope and typically inline with my experience.

They can save you some time by doing some fairly complex basic tasks that you can write in plain language instead of coding. To get good results you really need a lot of underlying knowledge yourself and essentially, I think of it as a translator. I can write a program in very good detail using normal language and then the LLM can convert it to code with reasonable accuracy.

I haven't been able to depend on it to do anything remotely advanced. They all make up API endpoints or methods or fill in data with things that simply don't exist, but that's the nature of the model.

fragmede

3 months ago

You misread me. I'm one of the people you're complaining about. Claude code has been great in my experience and no I don't have a GitHub repo of code that's been generated for you to tell me that's trivial and unadvanced and that a child could do it.

What I'm saying was to compare my experience with Claude code vs Codex with GPT-5. CC's better than codex in my experience, contrary to GP's comment.

FuckButtons

3 months ago

Maybe, just maybe, people are lying on the internet. And maybe those people have a financial interest in doing so.

intellectronica

3 months ago

2 replies

Codex works much better for long-running tasks that require a lot of planning and deep understanding.

Claude, especially 4.5 Sonnet, is a lot nicer to interact with, so it may be a better choice in cases where you are co-working with the agent. Its output is nicer, it "improvises" really well even if you give it only vague prompts. That's valueable for interactive use.

But for delegating complete tasks, Codex is far better. The benchmarks indicate that, as do most practicioners I talk to (and it is indeed my own experience).

In my own work, I use Codex for complete end-to-end tasks, and Claude Sonnet for interactive sessions. They're actually quite different.

shmoogy

3 months ago

1 reply

Can / Does Codex actually check docker logs and other things for feedback while iterating on something that isnt working ? That is where the true magic of Claude comes for me. Often things cant be one shot, but being able to iteratively check logs, make an adjustment, rebuild the docker containers, send a curl, and confirm fixed is huge improvement.

intellectronica

3 months ago

Yes, in this regard it's very similar. It works as an agent and does whatever you need it to do to complete the task. In comparison to Claude it tends to plan more and improvise less.

incoming1211

3 months ago

1 reply

I disagree, Codex always gets stuck and wants to double check and clarify things, its like "dammit just execute the plan and don't tell me until its completely finished"

The output of codex is also not as great. Codex is great at the planning and investigation portion but sucks at execution and code quality.

ewoodrich

3 months ago

1 reply

I've been dealing with this on Codex a lot lately. It confidently wraps up a task, I go to check it's work... and it's not even close.

Then I do a double take and re-read the summary message and realize that it pulled a "and then draw the rest of the owl", seemingly arbitrarily picking and choosing what it felt like doing in that session and what it punted over to "next steps to actually get it running".

Claude is more prone to occasional "cheating" with mocked data or "tbd: make this an actual conditional instead of hardcoded If True" stuff when it gets overwhelmed which is annoying and bad. But it at least has strong task adherence for the user's prompt and doesn't make me write a lawyer-esque contract to avoid any loopholes Codex will use to avoid doing work.

aaronblohowiak

3 months ago

Are you using something like spec-kit?

the_duke

3 months ago

1 reply

IMO gpt5-codex medium is much better as soon as the task becomes slightly complex, or the context grows a bit.

Sora 4.5 tends to randomly hallucinate odd/inappropriate decisions and goes to make stupid changes that have to be patched up manually.

jacurtis

3 months ago

Yes Sora hallucinates significantly more than Claude.

I find that Codex generally requires me to remove code to get to what I want, whereas Claude I tend to use what it gives me and I add to it. Whether this is from additional prompting or from manual typing, i just find that codex requires removal to get to desired state, and Claude requires adding to get to desired state. I prefer adding incrementally than removing.

Palmik

3 months ago

Curiously, you yourself did not provide an example where, from your experience, Claude Code was much ebtter.

WXLCKNO

3 months ago

1 reply

I agree with this and actually Claude Code agrees with it too. I've had Codex cli (gpt-5-codex high) and claude code 4.5 sonnet (and sometimes opus 4.1) do the same lengthier task with the same prompt in cloned folders about 10x now and then I ask them to review the work in the other folder and determine who did the best job.

100% of the time Codex has done a far better job according to both Codex and Claude Code when reviewing. Meeting all the requirements where Claude would leave things out, do them lazily or badly and lose track overall.

Codex high just feels much smarter and more capable than Claude currently and even though it's quite a bit slower, it's work that I don't have to go over again and again to get it to the standards I want.

pkreg01

3 months ago

I share your observations. It's strange to see Anthropic loosing so much ground so fast - they seemed to be the first to crack long-horizon agentic tasks via what I can only assume is an extremely exotic RL process.

Now, I will concede that for non-coding long-horizon tasks, GPT-5 is marginally worse than Sonnet 4.5 in my own scaffolds. But GPT-5 is cheaper, and Sonnet 4.5 is about 2 months newer. However, for coding in a CLI context, GPT-5-Codex is night-and-day better. I don't know how they did it.

lherron

3 months ago

Still a toss-up for me which one I use. For deep work Codex (codex-high) is the clear winner, but when you need to knock out something small Claude Code (sonnet) is a workhorse.

Also CC tool usage is so much better! Many, many times I’ve seen Codex writing a python script to edit a file which seems to bypass the diff view so you don’t really know what’s going on.

minimaxir

3 months ago

I like how in the demo video there's a squiggle emphasis on Claude's "Good Idea!" in response to a user clarification, when it's more common among vibe coders that that less glazing is better and they just want the LLM to write code.

_pvzn

3 months ago

This is kind of nice, as much as I love a good TUI, sometimes text editing in claude code can trip me up compared to a web GUI

Stevvo

3 months ago

Guess they couldn't name it "Claude Codex"

jngiam1

3 months ago

I got so used to having Claude Code read some of my MCP tools, and was bummed to see that it couldn't connect to them yet on the web.

Pretty cool though! Will need to use it for some more isolated work/code edits. Claude Code is now my workhorse for a ton of stuff including non-coding work (esp. with the right MCPs)

mkummer

3 months ago

Is the web interface open sourced anywhere? Looks great, excited to try it out

230 more comments available on Hacker News

View full discussion on Hacker News

ID: 45647166Type: storyLast synced: 11/22/2025, 11:00:32 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN