Composer: Building a Fast Frontier Model with Rl
Posted2 months agoActive2 months ago
cursor.comTechstoryHigh profile
supportivemixed
Debate
60/100
AI Coding AssistantsCursorRl Models
Key topics
AI Coding Assistants
Cursor
Rl Models
Cursor announces Composer, a fast frontier model with RL, sparking discussion about its performance, pricing, and comparison to other models like Sonnet 4.5.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
13m
Peak period
122
0-6h
Avg / period
16
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 29, 2025 at 12:04 PM EDT
2 months ago
Step 01 - 02First comment
Oct 29, 2025 at 12:17 PM EDT
13m after posting
Step 02 - 03Peak activity
122 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 1, 2025 at 3:37 AM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45748725Type: storyLast synced: 11/20/2025, 6:27:41 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Looking at the graph, it would appear there's an implicit "today" in that statement, as they do appear poised to equal or surpass Sonnet 4.5 on that same benchmark in the near future.
It’s the only coding agent I’m actually really motivated to use out of the box because it really does make me feel more productive while the others keep messing up the project, from way too large changes I didn’t ask for all the way to constant syntax and request errors.
It’s the only coding agent I’ve used that feels serious about being a product rather than a prototype. Their effort in improving their stack is totally paying off.
Countless times my requests in the AI chat just hang there for 30+ seconds more until I can retry them.
When I decided to give Claude Code a try (I thought I didn't need it because I used Claude in Cursor) I couldn't believe how faster it was, and literally 100% reliable.
EDIT: given today's release, decided to give it a go. The Composer1 model _is_ fast, but right at the second new agent I started I got this:
> Connection failed. If the problem persists, please check your internet connection or VPN
I would be willing to bet money your issue is on your side. I am a daily user since the beginning and cannot recall when I have had issues like you describe unless it was related to my corp network.
(Cursor dev)
Also, somehow magically, I’ve found Cursor’s Auto mode to be significantly faster than the specific models I’ve tried, Claude being among them.
I would agree it is not as good on doing lengthy work where it’s taking design all the way through implementing a feature in a single shot but trivial is not a good description.
I also don’t think you’re right. 3.5 was recently deprecated and even before then, Cursor has been hitting rate limits with Anthropic. Auto is as much a token cost optimization as it is a rate limit optimization.
Note, later I started using Codex and now Codex is my daily driver, Claude Code for problems where Codex fails (not many), and again Cursor is never used.
They were the first mover but Codex (in my opinion) blows Cursor up into 1000 tiny pieces. It's just so, so much better.
Can't help but notice you haven't tried Zed!
It's generation speed is not the problem or the time sink.
It's wrestling with it to get the right output.
---
And just to clarify as maybe I misunderstood again but people are comparing cursor to Claude Code and codex etc here- isn't this whole article all cursor just using different models?
Also, didn't realize you worked at Cursor - I'm a fan of your work - they're lucky to have you!
Totally agree that "smart model" is the table stakes for usefulness these days.
Wow, no kidding. It is quite good!
literally a 30 day old model and you've moved the "low" goalpost all the way there haha. funny how humans work
Speed of model just isn't the bottleneck for me.
Before it I used Opus 4.1, and before that Opus 4.0 and before that Sonnet 4.0 - which each have been getting slightly better. It's not like Sonnet 4.5 is some crazy step function improvement (but the speed over Opus is definitely nice)
I wonder how much the methods/systems/data transfer, if they can pull off the same with their agentic coding model that would be exciting.
Every time I write code myself I find myself racing the AI to get an indentation in before the AI is done... gets annoying
I run Claude Code in the background near constantly for a variety of projects, with --dangerously-skip-permissions, and review progress periodically. Tabbing is only relevant when it's totally failing to make progress and I have to manually intervene, and that to me is a failure scenario that is happening less and less often.
I'm not against YOLO vibe coding, but being against tab completion is just insane to me. At the end of the day, LLMs help you achieve goals quicker. You still need to know what goal you want to achieve, and tab completion basically let's me complete a focused goal nearly as soon as I determine what my goal is.
And it's not remotely "YOLO vibe coding". All the code gets reviewed, and tested thoroughly, and they are worked to specs, and gated by test suites.
What I don't do is babysit the LLM until it's code passes both the test suite and automated review stages, because it's a waste of time.
Others of these projects are research tasks. While I wrote this comment, Claude unilaterally fixed a number of bugs in a compiler.
I tried to use an appropriate emoji to express the joking nature of this comment, but HN silently filtered it out, so pretend you see a grinning face.
Usually I'll have several Claude Code sessions running in parallel on different projects, and when one of them stops I will review the code for that project and start it again - either moving forwards or re-doing things that have issues.
I actually find myself using the agent mode less now, I like keeping code lean by hand and avoid technical debt. But I do use the tab completions constantly and they are fantastic now ever since they can jump around the file.
I am an ML researcher at Cursor, and worked on this project. Would love to hear any feedback you may have on the model, and can answer question about the blog post.
Cursor Composer and Windsurf SWE 1.5 are both finetuned versions of GLM.
I don't use these tools that much ( I tried and rejected Cursor a while ago, and decided not to use it ) but having played with GPT5 Codex ( as a paying customer) yesterday in regular VSCode , and having had Composer1 do the exact same things just now, it's night and day.
Composer did everything better, didn't stumble where Codex failed, and most importantly, the speed makes a huge difference. It's extremely comfortable to use, congrats.
Edit: I will therefore reconsider my previous rejection
GPT-5-codex does more research before tackling a task, that is the biggest weakness for me not using Composer yet.
Could you provide any color on whether ACP (from zed) will be supported?
other links across the web:
https://x.com/amanrsanger/status/1983581288755032320?s=46
https://x.com/cursor_ai/status/1983567619946147967?s=46
Cursor Cheetah wouldve been amazing. reusing the Composer name feels like the reverse OpenAI Codex move haha
(Cursor researcher)
[1] https://www.businessinsider.com/no-shoes-policy-in-office-cu...
Do you have to split the plan in parallelizable tasks that could be worked in parallel in one codebase without breaking and confusing the other agents?
It's the most prominent part of the release post - but it's really hard to understand what exactly it's saying.
As a user, I want to know - when an improvement is claimed - whether it’s relevant to the work I do or not. And whether that claim was tested in a reasonable way.
These products aren’t just expensive - it requires switching your whole workflow. Which is becoming an increasingly big ask in this space.
It’s pretty important for me to be able to understand, and subsequently, believe a benchmark - I find it really hard not to read it as ad copy where this information isn’t present.
($1.25 input, $1.25 cache write, $0.13 cache read, and $10 output per million tokens)
> their own internal benchmark that they won't release
If they'd release their internal benchmark suite, it'd make it into the training set of about every LLM, which from a strictly scientific standpoint, invalidates all conclusions drawn from that benchmark from then on. On the other hand, not releasing the benchmark means they could've hand-picked the datapoints to favor them. It's a problem that can't be resolved unfortunately.
https://www.swebench.com/
ARC-AGI-2 keeps a private set of questions to prevent LLM contamination, but they have a public set of training and eval questions so that people can both evaluate their modesl before submitting to ARC-AGI and so that people can evalute what the benchmark is measuring:
https://github.com/arcprize/ARC-AGI-2
Cursor is not alone in the field in having to deal with issues of benchmark contamination. Cursor is an outlier in sharing so little when proposing a new benchmark while also not showing performance in the industry standard benchmarks. Without a bigger effort to show what the benchmark is and how other models perform, I think the utility of this benchmark is limited at best.
We could have third-party groups with evaluation criteria who don't make models or sell A.I.. Strictly evaluators. Alternatively, they have a different type of steady income with the only A.I. work they're doing being evaluation.
Then why publish the obscured benchmarks in the first place then?
Benchmarks have become less and less useful. We have our own tests that we run whenever a new model comes out. It's a collection of trivial -> medium -> hard tasks that we've gathered, and it's much more useful to us than any published table. And it leads to more interesting finds, such as using cheaper models (5-mini, fast-code-1, etc) on some tasks vs. the big guns on other tasks.
I'm happy to see cursor iterate, as they were pretty vulnerable to the labs leaving them behind when all of them came out with coding agents. The multi-agents w/ built in git tree support is another big thing they launched recently. They can use their users as "teacher models" for multiple completions by competing models, and by proxying those calls, they get all the signals. And they can then use those signals to iterate on their own models. Cool stuff. We actually need competing products keeping eachother in check, w/ the end result being more options for us, and sometimes even cheaper usage overall.
Right now, it seems free when you are a Cursor Pro user, but I'd love more clarity on how much it will cost (I can't believe it'll be unlimited usage for subscribers)
https://cursor.com/docs/models#model-pricing
It made migrating for everyone using VSCode (probably the single most popular editor) or another vscode forked editor (but at the time it was basically all VSCode) as simple as install and import settings.
I do not think Cursor would have done nearly as well as it has if it didn't. So even though it can be subpar in some areas due to VSCodes baggage, its probably staying that way for a while.
Maybe my complaint is that I wish vscode had more features like intellij, or that intellij was the open source baseline a lot of other things could be built on.
Intellij is not without its cruft and problems, dont get me wrong. But its git integration, search, navigation, database tools - I could go on - all of these features are just so much nicer than what vscode offers.
Still not up to Cursor standards though :)
Cursor's tab completion is better, but it doesn't seem to have a concept of not trying to tab complete. IntelliJ is correct half the time for completing the rest of the line and only suggests when it is somewhat confident in its answer.
I think competition in the space is a good thing, but I'm very skeptical their model will outperform Claude.
8 more comments available on Hacker News