GPT-Oss 120b Runs at 3000 Tokens/sec on Cerebras
Posted2 months agoActiveabout 2 months ago
cerebras.aiTechstory
excitedpositive
Debate
20/100
AICerebrasGPT-Oss
Key topics
AI
Cerebras
GPT-Oss
The GPT-OSS 120B model runs at 3000 tokens/sec on Cerebras, sparking excitement and discussion about its potential applications and performance.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
7h
Peak period
10
6-9h
Avg / period
4.1
Comment distribution29 data points
Loading chart...
Based on 29 loaded comments
Key moments
- 01Story posted
Nov 7, 2025 at 10:24 PM EST
2 months ago
Step 01 - 02First comment
Nov 8, 2025 at 5:30 AM EST
7h after posting
Step 02 - 03Peak activity
10 comments in 6-9h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 10, 2025 at 12:04 AM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45853849Type: storyLast synced: 11/20/2025, 8:00:11 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
If you're going after the AI money gravy train then you need to wave the "we have $n registered users" carrot on your PPT slides for the investors because registered user == monetization opportunity.
I'm not defending it. I hate being forced to register for shit when I just want to try it or use the free tier.
But it is what it is.
Exactly this.
If you present me with a form and a submit button then I expect the input to go through and a result to be presented.
If you don't want to present me with results before login, then put the form behind the wall too.
Simple.
They have other options... rate limiting, serving (more) quantized to non-registered etc. etc.
If this was some beat-to-hell, high-mileage used economy car, sure, that would be a pain in the ass, and not worth it. But it's a mistake to place Cerebras into that mental bucket.
You don't even need to use real information to create an account. Just grab a temp-mail disposable address and sign up as fred flintstone or mickey mouse.
If you're a heavy LLM inference user (i.e. if you've ever paid for a $200/mo sub from any of the big AI labs), I can damn near guarantee you will not regret trying out Cerebras.
A week ago I went to a launch party for a product that's supposed to "revolutionize design" (a web app w/ an OAI prompt).
No demo, only like two pictures of the actual product. Founder spent like half an hour giving a speech about the future, etc...
"All of you here will get access to it in a couple weeks."
Couple weeks go by ... I "get access". It's a .dmg, 1) What, I open it, it's not even an app, it's an installer ..., I install it, the app opens up and it's a giant red button that takes you to a website to create an account ...
These guys are completely lost.
[1] https://www.sec.gov/Archives/edgar/data/2021728/000162828024...
I live in UAE, whose continuing enthusiasm in AI investment stretches well beyond short-term profit, so having AD on-board seems like a plus not a minus. I'm sure there are specific exceptions, but generally Emirati money has seemed like smart money.
But I'm just reasoning from first principles. I don't have any specific data about them.
Is this incorrect?
I don't think that this is a dupe or anything and 3000 t/s is really cool, the other post just has more discussion of Cerebras and people's experiences with using GLM 4.6 for software development.