Addendum to GPT-5 System Card: GPT-5-Codex

4 months ago

I cap my context at 50k tokens.

4 months ago

Agreed, and judicious use of subagents to prevent pollution of the main thread is another good mitigant.

tanvach

4 months ago

I also noticed the laziness compared to Sonnet models but now I feel it’s a good feature. Sonnet models, now I realize, are way too eager to hammer out code with way more likelihood of bugs.

bayesianbot

4 months ago

I definitely agree with all of those points. I just really prefer it completing steps and asking me if we should continue to next step rather than doing half of the step and telling me it's done. And the context degradation seems quite random - sometimes it hits way earlier, sometimes we go through crazy amount of tokens and it all works out.

apigalore

4 months ago

Yes, this is the one thing stopping me from going to Codex completely. Currently, it's kind of annoying that Codex stops often and asks me what to do, and I just reply "continue". Even though I already gave it a checklist.

With GPT‑5-Codex they do write: "During testing, we've seen GPT‑5-Codex work independently for more than 7 hours at a time on large, complex tasks, iterating on its implementation, fixing test failures, and ultimately delivering a successful implementation." https://openai.com/index/introducing-upgrades-to-codex/

mritchie712

4 months ago

1 reply

Have you used Claude Code? How does it compare?

https://help.openai.com/en/articles/11369540-using-codex-wit...

4 months ago

2 replies

It's objectively a big improvement over Claude Code. I'm rooting for anthropic, but they better make a big move or this will kill CC.

nightshift1

4 months ago

1 reply

What are the usage limits like compared to Claude Code? Is it more like 5× or 20×? For twice the price, it would have to be very good.

naiv

4 months ago

1 reply

have to say not sure what this even means and what the exact definition of a message is in this context.

with claude code max20 I was constantly hitting limits, with codex not once yet

4 months ago

Same. We're not hitting limits at all with Codex and it's ridiculously good at managing and preserving its context window while getting a metric fuckton of work done. It's kind of unbelievable actually. I don't know re billing. Not my dept.

mike_hearn

4 months ago

Are you talking about Codex CLI or their GitHub integration?

GPT-5 is a great model. I tried Codex CLI Rust, as they seem to be deprecating the JS version, and it is awful. I don't know what possessed them to try and write a TUI in Rust but it isn't working. The Claude Code UI is hugely superior.

4 months ago

5 replies

Agreed. We're hardcore Claude Code users and my CC usage trended down to zero pretty quickly after I started using Codex. The new model updates today are great. Very well done OpenAI team!! CC was an existential threat. You responded and absolutely killed it. Your move Anthropic.

Jcampuzano2

4 months ago

2 replies

To be fair, Anthropic kinda did this to themselves. I consider it as a pretty massive throw on their end in terms of the fairly tight grasp they had on developer sentiment.

Everyone else slowly caught up and/or surpassed them while they simultaneously had quality control issues and service degradation plaguing their system - ALL while having the most expensive models comparatively in terms of intelligence.

4 months ago

3 replies

Agreed. I really wish Google would get their act together because I think they have the potential of being faster, cheaper with bigger context windows. They're so great at hardcore science and engineering, but they absolutely suck at products.

bjackman

4 months ago

3 replies

I think this is being downvoted coz it doesn't seem to be really responding to the thread, and maybe it isn't, but for anyone who hasn't tried Gemini CLI:

My experience after a month or so of heavy use is exactly this. The AI is rock solid. I'm pretty consistently impressed with its ability to derive insights from the code, when it works. But the client is flaky, the backend is flaky, and the overall experience for me is always "I wish I could just use Claude".

Say 1 in 10 queries craps out (often the client OOMs even though I have 192Gb of RAM). Sounds like a 10% reliability issue but actually it just pushes me into "fuck this I'll just do it myself" so it knocks out like 50% of the value of the product.

(Still, I wouldn't be surprised if this can be fixed over the next few months, it could easily be very competitive IMO).

macNchz

4 months ago

1 reply

I have been heavily using the Gemini API via Aider for a few months and it has been absolutely stable. Claude, in comparison, has been much flakier. OpenAI somewhere in between.

bjackman

4 months ago

It's definitely possible there's a "grass is always greener" effect going on here, to be fair.

None of these tools give the impression of being well-tested software. My guess is that neither OpenAI nor Anthropic actually has the necessary density in expertise to build quality software. Google obviously can build good software _when it really wants to_ but in this space its strategy looks like "build the products the other guys are building, cut whatever corners necessary to do this absolutely as fast as possible".

So even if my initial impressions are more accurate it's quite possible Google wins long term here.

faxmeyourcode

4 months ago

Semi-related but I have the same experience with the gemini mobile app on android. ChatGPT and Claude are both great user experiences and the best word to describe how the gemini app feels is flaky.

dumpsterdiver

4 months ago

Just adding my two cents after test driving Gemini Ultra after being a long time ChatGPT Pro subscriber:

Remember the whole “Taken 3 makes Taken 2 look like Taken 1” meme? Well Google’s latest video generating AI makes any video gen AI I’ve seen up until now look like Taken 3* (sigh, I said 1, ruined it) - and they are seriously impressive on their own.

Edit: By “they” I mean the other video generating AI makes models, not the other Taken movies. I hope Liam Neeson doesn't read HN, because a delivery like that might not make him laugh.

echelon

4 months ago

1 reply

I really do not want Google to win anything. They're a giant monopoly across multiple industries. We need a greater balance of power.

Antitrust enforcement has been letting us down for over two decades. If we don't have an oxygenation event, we'll go an entire generation where we only reward tax-collecting, non-innovation capital. That's unhealthy and unfair.

Our career sector has been institutionalized and rewards the 0.001% even as they rest on their laurels and conspire to suppress wages and innovation. There's a reason why centicorns petered out and why the F500 is tech-heavy. It's because big tech is a dragnet that consumes everything it touches - film studios, grocery stores, and God only knows what else it'll assimilate in the unending search for unregulated, cancerous growth.

FAANG's $500k TC is at the expense of hundreds of unicorns making their ICs even wealthier. That money mostly winds up going to institutional investors, where the money sits parked instead of flowing into huge stakes risks and cutthroat competition. That's why a16z and YC want to see increased antitrust regulations.

But it's really bad for consumers too. It's why our smartphones are stagnant taxation banana republics with one of two landlords. Nothing new, yet as tightly controlled an authoritarian state. New ideas can't be tried and can't attain healthy margins.

It's wild that you can own a trademark, but the only way for a consumer to access it is to use a Google browser that defaults to Google search (URLs are scary), where the search results will be gamed by competitors. You can't even own your own brand anymore.

Winning shouldn't be easy. It should be hard. A neverending struggle that rewards consumers.

We need a forest fire to renew the ecosystem.

andai

4 months ago

2 replies

Google supposedly claimed to have no moat, but they actually have

- all the users

- all the apps (Google, GMail, YouTube, Docs, Maps...)

- all the books (Google Books)

- all the video (YouTube)

- all the web pages

- custom hardware

It's honestly weird they aren't doing better. Agree that the models are great and the UX is bad all around.

LordDragonfang

4 months ago

1 reply

Google has been, for at least a decade, making pretty terrible choices that squander developer and power-user goodwill (see: any thread where they announce a new product and one of the top comments will link to killedbygoogle). When you've burnt bridges with your biggest evangelists, adoption by normies slows, and your products appear to stagnate.

Unfortunately, they've been insulated from the consequences of their bad decisions by the fact the money printer (ads) keeps their company afloat and mollifies shareholders. The moment that dries up, they're in trouble.

echelon

4 months ago

We say this (I admit I would say the same as you), and yet their revenue is $400 billion a year.

I don't think they care what we think. They're thriving despite our protests.

But yeah, they shouldn't be shielded from antitrust. They have literally everything.

brianjking

4 months ago

Hey now, let's not forget it. They also have:

- all the lobbyists - all the money

bobbylarrybobby

4 months ago

Google can do anything but get their act together.

zamalek

4 months ago

You're absolutely right!

notfromhere

4 months ago

1 reply

Gpt5 writes clean, simple code and listens to instructions. I went from tons of Claude APi usage to usage to basically none overnight

ttul

4 months ago

Agreed. GPT’s coding is so much cleaner. Claude tends to ramble and generate unnecessary scaffolding. GPT’s code is artful and minimalist.

epolanski

4 months ago

1 reply

But how do you use it?

It's super annoying that it doesn't provide a way to approve edits one by one instead it either vibe codes on its own or gives me diffs to copy paste.

Claude code has a much saner "normal mode".

brianjking

4 months ago

1 reply

Wait, this wasn't what I was experiencing. Did something change in gpt-5-codex or was that your normal experience?

epolanski

4 months ago

I asked you how do you use it.

Is it via CLI? Is it via extension to an editor? What is your flow?

ttul

4 months ago

2 replies

This just goes to show how crucial it was for Anthropic and OpenAI to hire first class product leads. You can’t just pay the AI engineers $100M. Models alone don’t generate revenue.

arthurcolle

4 months ago

the model is the product

dwohnitmok

4 months ago

I got the exact opposite lesson. The parent and grandparent comments seem to be talking about dropping one product for another purely on the strength of the model.

codehead

4 months ago

I would sincerely like to understand what your steps were to get you to convincingly move down to zero usage of CC. I have seen hits and misses with codex to feel like it tries really hard to be good, and in some ways it is (like the out-of-the-box context management seems like a pretty smooth batteries included feature), but in some important (to me) ways, it just keeps falling on its face (like giving up on what it deems to be too complex of a task-in my case, porting a pretty robust JS deobfuscation tool (works but is mad slow) over to Rust-and that has prevented me from feeling so full of confidence and speculative joy about, thus far. It caught and fixed some bugs after a few turns of renewing context but I was doing that with CC (with better walkthroughs as it did its thing) so it felt underwhelming to me. As anecdotal as my situation/experience sounds, I still feel like with every "new"-ish thing that gets thrown at us regarding Ai tooling and similar such news, the hype does not live up to the reality, FOR ME.

FergusArgyll

4 months ago

3 replies

It doesn't seem to have any internal tools it can use. For example, web search; It just runs curl in the terminal. Compared to Gemini CLI that's rough but it does handle pasting much better... Maybe I'm just using both wrong...

gizmodo59

4 months ago

1 reply

Use --search option when you start codex

FergusArgyll

4 months ago

Thanks!

Tiberium

4 months ago

1 reply

It does have web search - it's just not enabled by default. You can enable it with --search or in the config, then it can absolutely search, for example finding manuals/algorithms.

FergusArgyll

4 months ago

Thanks!

ollybee

4 months ago

web search too is off by default

robotswantdata

4 months ago

1 reply

Agreed ditched my Claude code max for the $200 pro ChatGPT.

Gemini cli is too inconsistent, good for documentation tasks. Don’t let it write code for you

icelancer

4 months ago

2 replies

Gemini's tool calling being so bad is pretty amazing. Hopefully in the next iteration they fix it, because the model itself is very good.

nowittyusername

4 months ago

1 reply

This is a recurring theme with Google. Their models are phenomenal but the systems around them are so bad that it degrades the whole experience. Veo3 great model horrible website, and so on...

brianjking

4 months ago

Their massive increase in token processing since Veo3 and nano banana have been released would say otherwise...

Or we're all just used to eating things we don't like and smiling.

robbrulinski

4 months ago

That has been my experience as well with every Gemini model, ugh!

DanielVZ

4 months ago

1 reply

Can someone compare it to cursor? So far i see people compare it with Claude code but I’ve had much more success and cost effectiveness with cursor than Claude code

bionhoward

4 months ago

Doesn’t compare, because Cursor has a privacy mode. Why would anyone want to pay OpenAI or Anthropic to train their bots on your business codebase? You know where that leads? Unemployment!

vitorgrs

4 months ago

1 reply

Gemini seems to be pretty awful as agentic coding. It always finish the task, and when I see the result, it just breaks my code.

Not sure the fault it's "doing bad code", I guess it's just not being good at being agentic. Saw this on Gemini CLI and other tools.

GLM, Kimi, Qwen-Code all behaves better for me.

Probably Gemini 3 will fix this, as Gemini 2.5 Pro it's "old" by now.

4 months ago

Gemini CLI is bad, model itself is really good.

troupo

4 months ago

> then just randomly mock a function like Gemini used to

Claude Code does that on longer tasks.

Time to give Codex a try I guess.

Difwif

4 months ago

1 reply

Is this available to use now in Codex? Should I see a new /model?

andrewmunsell

4 months ago

Yes, but I had to update the Codex CLI manually via NPM to see it. The VS Code extension auto-updated for me

4 months ago

3 replies

Codex always appears to use spaces, even when the project uses tabs (aka, a Go file). It's so annoying.

asadm

4 months ago

3 replies

this + any coding conventions should ALWAYS be a post process. DO NOT include them in your prompt, you are losing model accuracy over these tiny things.

https://en.wikipedia.org/wiki/Wu_wei

4 months ago

2 replies

It helps to actually be able to read the diffs of its proposals/changes in the terminal. The changing from tabs -> spaces on every line it touches generally results in unreadable messes.

I have a pretty complex project, so I need to keep an eye on it to ensure it doesn't go off the rails and delete all the code to get a build to pass (it wouldn't be the first time).

wahnfrieden

4 months ago

5 replies

You are poisoning your context making it focus on an unusual requirement contrary to most of its training data. It’s a formatter task, not an LLM task

In fact you should convert your code to spaces at least before LLM sees it. It’ll improve your results by looking more like its training data.

MaxLeiter

4 months ago

I wrote a bit about this yesterday: https://maxleiter.com/blog/rewrite-your-prompts

> Reason #3a: Work with the model biases, not against

Another note on model biases is that you should lean into them. The tricky part with this is the only way to figure out a model's defaults is to have actual usage and careful monitoring (or have evals that let you spot it).

Instead of forcing the model to behave in ways it ignores, adapt your prompts and post-processing to embrace its defaults. You'll save tokens and get better results.

If the model keeps hallucinating some JSON fields, maybe you should support (or even encourage) those fields instead of trying to prompt the model against them.

ameliaquining

4 months ago

Presumably the Go source files in the training corpus used tabs?

dboreham

4 months ago

Obviously I'm not a $100M AI genius but haven't they tried transforming both training data and context into some normal form (whitespace neutral)?

joquarky

4 months ago

Good advice! Reminds me of wu-wei.

Cut wood with the grain, not against it.

4 months ago

Go uses tabs. Full stop. There is no Go code with spaces. Not if they're using the built-in formatter, anyway. In any case, this is about the diff codex is outputting, not the code I commit. With Claude, I generally don't need to run `go fmt`, but with codex, it is absolutely necessary.

ameliaquining

4 months ago

1 reply

I think the idea is that your IDE or whatever should automatically run the project's autoformatter after every AI edit, so that any formatting mistakes the AI makes are fixed before you have to look at them.

4 months ago

1 reply

Do you not look at changes in your terminal as it is making them?

ameliaquining

4 months ago

The thing in the terminal could also run the project autoformatter on the changes before displaying them.

scrollaway

4 months ago

1 reply

Does codex have a good way of doing post process hooks? For Claude Code hooks I never found a way to run a formatter over only the file that was edited. It’s super annoying as I want to constantly have linting and formatting cleaned up right after the model finishes editing a file…

4 months ago

Check out lint-staged in npm. You can configure it so it will run even if the files aren’t staged, thus linting any changed files.

Der_Einzige

4 months ago

Stop telling the normies the secrets please! You've just harmed job security quite a bit for a lot of people!

dgfitz

4 months ago

The future is truly here, we finally solved the tab vs spaces debate. The singularity must be right around the corner.

wahnfrieden

4 months ago

Just use a linter hook to standardize style

4 months ago

1 reply

I think it would be cool to see *nix “emulation” integrated into coding AIs. I don’t think it’s necessary to run these agents inside of container as most people are right now. That’s a lot of overhead.

4 months ago

1 reply

You mean instead of them running the code that they are writing they pretend to run the code and the model shows what it thinks would happen?

I don't like that at all. Actually running the code is the single most effective protection we have against coding mistakes, from both humans and machines.

I think it's absolutely worth the complexity and performance overhead of hooking up a real container environment.

Not to mention you can run a useful code execution container in 100MB of RAM on a single CPU (or slice thereof). Simulating that with an LLM takes at least one GPU and 100GB or more of VRAM.

4 months ago

2 replies

I understand your point but I basically find myself running all my agents in barebones containers and they’re basically short-run make-or-kill types. And once we ramp up agent counts, possibly into the thousands, that could add up rapidly. Of course, you would run milestone tests on actual container/envs but I think there might be a need for lighter solutions for rapid agent dev runs.

4 months ago

1 reply

You do realize that there is virtually no overhead in running containers, right? That's the entire point of their existence. They're just processes, with specific permissions (to generalize it). Your computer can run thousands of processes without sweating.

4 months ago

1 reply

> You do realize that there is virtually no overhead in running containers, right? That's the entire point of their existence.

No, I didn’t know running containers used “virtually no overhead.” It appears I can run millions of containers without any resource constraint? Is that some sort of cheat code?

4 months ago

1 reply

The only resource constraints are physical. You can run millions of containers, but it is unlikely you have the physical resources to do meaningful work with them.

4 months ago

1 reply

So now you’re saying there are constraints? You just said there are no limits. You can run infinite containers. Why did you lie about this magic?

jonfk

4 months ago

What withinboredom meant is that running processes in containers don't add overhead vs simply running them outside of them. That is mostly true because of the way that containers work in linux through cgroups and namespaces, which means that you would only be limited by what your hardware would already be able to run before running the processes in containers.

rgo

4 months ago

There are now many solutions, and full-blown startups, under the "swarm", "agent orchestration" and other similar keywords, for spinning agents in the cloud. I'm not sure if that's what you mean, but I totally see most of vibe coding being replaced by powerhouse agents, placed locally or in the cloud, picking up tasks and working them out until its really done.

4 months ago

2 replies

I signed up to OpenAI, verified my identity, and added my credit card, bought $10 of credits.

But when I installed Codex and tried to make a simple code bugfix, I got rate limited nearly immediately. As in, after 3 "steps" the agent took.

Are you meant to only use Codex with their $200 "unlimited" plans? Thanks!

wahnfrieden

4 months ago

1 reply

Use Plus first

4 months ago

2 replies

Thank you - so to confirm Codex _requires_ basically the Plus or $200 plans otherwise it just does not work?

Tiberium

4 months ago

1 reply

You can use Codex CLI with an API key instead of a subscription, but then you won't have access to this new GPT-5 Codex model, since it's not out on the API yet. But normal GPT-5 in Codex is perfectly fine.

4 months ago

1 reply

That's what was in my OP, I used the API approach, but the rate limits were insanely low (seemingly?) the agent died after 3 steps in a single question.

embirico

4 months ago

1 reply

when was that @sergiotapia? last week we just upped the base rate limit for new API accounts

4 months ago

This was September 11th, 2025.

    gpt-5-2025-08-07
    38.887K input tokens

That was my usage, and I got rate limited. Thank you for your tips!

4 months ago

The new GPT-5-Codex model isn't yet available in the API, so if you want to try that model using the Codex CLI tool the only way to do that is with a ChatGPT account (I'm more sure if the free account has it, the $20/month definitely does). You need to then authenticate Codex CLI with ChatGPT.

OpenAI say API access to that model is coming soon, at which point till be able to use it in Codex CLI with an API key and pay for tokens as you go.

You can also use the Codex CLI tool without using the new GPT-5-Codex model.

4 months ago

why people are willing to jump all these hoops?

4 months ago

1 reply

Does Codex have token-hiding (cf Anthropic’s “subagents”)?

I was tempted to give Codex a try but a colleague was stung by their pricing. Apparently if you go over your Pro plan allocation, they just quietly and automatically start billing you per-token?

steveklabnik

4 months ago

1 reply

I tried Codex with the $20/month plan recently and it did exactly what Claude Code does, stop and tell you “sorry, you’re out of credit, come back in x days.”

4 months ago

1 reply

Thank you, glad to hear it. Sounds like my colleague might have had it misconfigured. I’ll give Codex a try then.

embirico

4 months ago

1 reply

Hey, I work on Codex—absolutely no way that a user on a Pro plan would somehow silently move to token-based billing. You just hit a limit and have to wait for the reset. (Which also sucks, and which we're also improving early warnings of.)

[0]https://github.com/openai/codex/blob/main/codex-rs/core/gpt_...

4 months ago

Thanks for that, appreciate the clarification. I’ll check with my colleague and report back on his experience. Certainly don’t want to misrepresent.

jumploops

4 months ago

2 replies

Interesting, the new model uses a different prompt in Codex CLI that's ~half the size (10KB vs. 23KB) of the previous prompt[0][1].

SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors (via internal refactor benchmark 33.9% -> 51.3%).

As someone who recently used Codex CLI (`gpt-5-high`) to do a relatively large refactor (multiple internal libs to dedicated packages), I kept running into bugs introduced when the model would delete a file and then rewrite it (missing crucial or important details). My approach would have been to just the copy the file over and then make package-specific changes, so maybe better tool calling is at play here.

Additionally, they claim the new model is more steerable (both with AGENTS.md and generally).

In my experience, Codex CLI w/gpt-5 is already a lot more steerable than Claude Code, but any improvements are welcome!

[1]https://github.com/openai/codex/blob/main/codex-rs/core/prom...

(comment reposted from other thread)

robotswantdata

4 months ago

1 reply

saw the same behaviour

What worked was getting it to first write a detailed implementation plan for a “junior contractor” then attempt it in phases (clearing task window each time) and told to use /tmp to copy files and transform them then update the original.

Looking forward to trying the new model out on the next refactor!

jumploops

4 months ago

Yes, regardless of tool, I always create a separate plan doc for larger changes

Will try adding the instructions specific to refactors (i.e. copy/move files, don't rewrite when possible)

I've also found it helpful, especially for certain regressions, to basically create a new branch for any Codex/CC assisted task (even if part of a larger task). Makes it easier to identify regressions due to recent changes (i.e. look at git diff, it worked previously)

Telling the "agent" to manage git leads to more context pollution than I want, so I manage all commits/branches myself, but I'm sure that will change as the tools improve/they do more RL on full-cycle software dev

4 months ago

I do not trust SWE bench, here i am using gemini 2.5 pro and single shot most features: https://www.reddit.com/r/ChatGPTCoding/comments/1nh7bu1/3_ph...

4 months ago

2 replies

Can someone explain what this all means? Has codex just been updated to use chat-gpt 5 ? Or is this just extra info?

amrrs

4 months ago

1 reply

It is a new version of GPT-5 that's been primarily optimized for coding. Hence this confusing name - GPT-5-Codex.

This model is available inside all OpenAI codex products. Yet to be available on Api

The model is supposed to be better at code reviews and Comments than the other GPT-5 variant. It can also think/work upto 7 hours.

4 months ago

Amazing cheers

4 months ago

2 replies

I posted some notes here that might be useful: https://simonwillison.net/2025/Sep/15/gpt-5-codex/

Even shorter version:

- New coding-specialist model called GPT-5-Codex, coming soon to the API but for now available in their Codex CLI, VS Code and Codex Cloud products

- New code review product (part of Codex Cloud) that can review PRs for you

- New model promises better code review, less pointless comments and can vary its reasoning effort for simple vs complex tasks

naiv

4 months ago

1 reply

The pelican is not so convincing though :)

So a bit in line with what Theo mentioned in his video that he was not happy with the ui capabilities

4 months ago

Good point I forgot about the pelican!

https://cdn.openai.com/pdf/97cc5669-7a25-4e63-b15f-5fd5bdc4d...

4 months ago

Amazing thank you

WhitneyLand

4 months ago

1 reply

Apparently today is the first release with MCP support.

Updates (v0.36) https://github.com/openai/codex/releases

artdigital

4 months ago

1 reply

Codex had MCP support for a long long time

WhitneyLand

4 months ago

Really, I thought I had checked for it a couple months ago and didn’t see it?

Commented after I saw this added in today’s release notes: “initial MCP interface and docs”

hereme888

4 months ago

2 replies

Codex just ate up my remaining turns for the day for a clearly defined patch that should have taken just a few actions. Anyone else experienced that?

denuoweb

4 months ago

Yes. "Failed. You've hit your usage limit. Upgrade to Pro (https://openai.com/chatgpt/pricing) or try again in 3 days 10 minutes."

I can't use the IDE codex at all now it seems.

bn-l

4 months ago

Yes. I believe it’s a bug from their issues page

bezzi

4 months ago

1 reply

is this model just acting super slow with you guys too?

naiv

4 months ago

Feels slower than GPT-5 and I understood it that medium should be a lot faster than high but for me it's almost the same , so I don't see a reason preferring medium.

tschellenbach

4 months ago

2 replies

is it already supported in cursor? don't see it just yet

toomanyflops

4 months ago

while not available as a specific model to use in cursor, it is available via openai’s codex extension on vscode/cursor

mindwok

4 months ago

It's not available via the API yet, so probably not.

bionhoward

4 months ago

1 reply

Meh, what’s the point if it’s got no privacy, which companies want to let OpenAI read your codebase? Cursor keeps winning because of privacy mode IMHO, there is no level of capability which outweighs privacy mode

Topfi

4 months ago

Maybe I misunderstand you, but looking at their own documentation on the topic, I hardly see any advantage in terms of privacy when using Cursor Privacy Mode over OpenAIs Data Controls:

> OpenAI

> We rely on many of OpenAI's models to give AI responses. Requests may be sent to OpenAI even if you have an Anthropic (or someone else's) model selected in chat (e.g. for summarization)*. We have a zero data retention agreement with OpenAI.

Source: https://cursor.com/security

I will say that the Security page by the Cursor team is a very nice overview, even going into Auth, etc. and applaud that, but see nothing here that differentiates their use of e.g. OpenAI models from the agreements OpenAI offers themselves. Essentially, I don't see why anyone would have such severely heightened trust in Cursor over competitors in this area. If they only provided self hosted models, I could understand it, but not the way they operate.

Personally, both because of the way and on what LLMs have been trained on, on top of my expectation in terms of privacy, regardless of model provider assurances, I'd treat any LLM derived/assisted/reviewed code as public the second you send it to some providers server hosted model and some form of FOSS to boot. Basically, if you used Cursor, Codex, Augment or anything of that sort, I'd reduce any future privacy expectations straight away, might as well put it on public Github for everyone to see.

Only self-hosting on prem is an option for keeping control of your codebase, though personally, I'd still consider licensing such code as FOSS, considering no model wasn't trained on EUPL, GPL, etc. Personal (very much philosophical and not at all legal, as that goes into what training is, weights, etc. arguments that can go on eternal) opinion, but I'd argue whether you are MSFT or a small startup, if you derive a significant amount of new code from LLMs, arguing that copyleft shouldn't be at the very least on the mind of your legal department isn't reasonable, but of course, this will have to be decided by courts and likely in favour of those with the best legal teams. I doubt if any of the "80% of our code is written by LLMs" were true, that'd convince a court to enforce copyleft upon the product in question, but personally, that'd be my viewpoint.

Regardless of licensing, if you send your code to Cursor, purely privacy wise, you shouldn't have reservations about OpenAI.

8cvor6j844qw_d6

4 months ago

2 replies

Anyone can share their thoughts on Claude Code vs Codex?

I've just started out trying out Claude Code and am not sure how Codex compares on React projects.

From my initial usage, it seems Claude Code planning mode is superior than its normal? mode, and giving it an overall direction to proceed and rather than just stating a desired feature seems to produce better results. It also does better if a large task are split into very small sub-tasks.

arthurcolle

4 months ago

claude code for first 3-4 months was a monster. it's been optimized

nowittyusername

4 months ago

I've used Claude Code for about 3 months now. Was a big fan until recent changes lobotomized it. So I switched over to codex about 2 weeks ago and loving it so far, way better experience. Today with the introduction of the new model, i been refactoring old claude code project all day and so far things are looking good. I am very impressed, OpenAI cooked hard here...

hamish-b

4 months ago

2 replies

My problem _still_ with all of the codex/gpt based offerings is that they think for way too long. After using Claude 4 models through cursor max/ampcode I feel much more effective given it's speed. Ironically, Claude Code feels just as slow as codex/gpt (even with my company patching through AWS bedrock). Only makes me feel more that the consumer modes have perverse incentives.

hmottestad

4 months ago

It’s great for multitasking. I’ve cloned one of the repos I work on into a new folder and use Codex CLI in there. I feed it bug reports that users have submitted, while I work on bigger tasks.

strangescript

4 months ago

I almost never have to reprompt GPT-5-high (now gpt-5-codex-high) where I would be reprompting claude code all the time. It feels like its faster, doing more, but its taking more of the developers time by getting things wrong.

foft

4 months ago

1 reply

I've had great results with Codex, though I found ChatGPT 5 was giving much better results than the existing model. So ended up using that directly instead. So very excited to have the model upgraded in Codex itself.

The main issues with Codex now seem to be the very poor stability (it seems to be down almost 50% of the time) and lack of custom containers. Hoping those get solved soon, particularly the stability.

I also wonder where the price will end up, it currently seems unsustainably cheap.

raincole

4 months ago

> I also wonder where the price will end up, it currently seems unsustainably cheap.

Jetbrains has a $30/mo subscription (with gpt5 backend) and the quota burns so fast.

Assuming Jetbrains is at breakeven price, either OpenAI has some secret sauce or they're losing money for Codex.

zapnuk

4 months ago

It would be nice if this model would be good enough to update their typscript sdk (+agents library) to use, or at least support, zod v4 - they still use v3.

Had to spend quite a long time to figure out a dependency error...

anshumankmr

4 months ago

So is this a new model or just a different checkpoint for coding?

6thbit

4 months ago

Direct link to the pdf

mkoygh76k

4 months ago

lhb8ubk88/.mves

mindwok

4 months ago

Codex with GPT-5-High is extremely good. Like many I was a bit "meh" about the GPT 5 release, however once I started using it with Codex it became clear there was a substantial improvement in a capability I wasn't really paying attention to, which is tool calling. Or more specifically, when to call a tool. Ask GPT-5-High a question about your codebase and watch the things it looks for, and things it searches for (if you use --search). It has very good taste on how to navigate and solve a problem.

esafak

4 months ago

Does OpenAI demand biometrics to use GPT-5-Codex?