Deepseek-V3.2-Exp
Posted3 months agoActive3 months ago
github.comTechstoryHigh profile
excitedpositive
Debate
20/100
AILLMDeep Learning
Key topics
AI
LLM
Deep Learning
The DeepSeek-v3.2-Exp AI model is released with improved performance and reduced costs, sparking discussion on its potential impact and the future of AI pricing.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
49m
Peak period
18
6-9h
Avg / period
6.3
Comment distribution50 data points
Loading chart...
Based on 50 loaded comments
Key moments
- 01Story posted
Sep 29, 2025 at 6:26 AM EDT
3 months ago
Step 01 - 02First comment
Sep 29, 2025 at 7:15 AM EDT
49m after posting
Step 02 - 03Peak activity
18 comments in 6-9h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 30, 2025 at 6:06 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45412098Type: storyLast synced: 11/20/2025, 6:27:41 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
$0.28/M Input ($0.028/M cache hit) > $0.42/M Output
(The inference costs are cheaper for them now as context grows because of the Sparse attention mechanism)
Output: $1.68 per million tokens.
https://api-docs.deepseek.com/news/news250929
I think this is just as important to distribution of AI as model intelligence is.
AFAIK there are no fundamental "laws" that prevent price from continuing to fall, at least correlated with Moore's law (or whatever the current AI/Nvidia chip development cycle is called right now)- each new generation of hardware is significantly faster/cheaper than the next- so will we see a ChatGPT-5 model at half the price in a year? (yes I know that thinking models cost more, but just on a per-token basis)
Price deflation is not tied to Moore's right now because much of the performance gains are from model optimization, high bandwidth memory supply chains, and electrical capacity build out, not FLOP density.
Part of me is optimistic that when the AI bubble bursts the excess data center capacity is going to be another force driving the cost of inference down.
Performance gained from model improvements has outpaced performance gained from hardware improvements for decades.
Yeppers, when that bubble burst - that's hilarious. This is the kinda stuff grandkids won't believe someday.
I believe you but that's not exactly an unbiased source of information.
This is usually not the case for paid models -- is Openrouter just marking this model incorrectly or do Deepseek actually train on submitted data?
I guess I'll wait for a 3rd party provider on Openrouter that doesn't log DS 3.2.
https://openrouter.ai/docs/features/privacy-and-logging#data...
It seems so.
Is it just the API client bindings that are open and the core routing service is closed!
If they lead the market, they'll extract value in lots of ways that an open company could at least be compelled not to. Plus there won't be competition.
They're probably selling your data to LLM companies and you don't even see what they're doing.
Without competition, they'll raise their rates.
If they were open, you could potentially run the offering on-prem. You could bolt on new providers or use it internally for your own routing.
Lots of reasons.
I think it's just called OpenRouter because the founder previously started OpenSea (an NFT marketplace), and also probably to sound a bit similar to OpenAI. It's like companies calling their products "natural" or "organic" or "artisan" when they can get away with it, just a marketing strategy of using words that conjure up vaguely positive connotations in your mind.
It's a frictionless marketplace connecting inference providers and customers, creating a more competitive market. Or a more open market if you play a bit fast and loose with terminology
Input and output costs are peanuts compared to the order of magnitude(or more) amount of tokens that hit the cache.
At that point you might as well use GPT-5. It will be the same price or cheaper, and more capable.
deepseek API supports caching, stop manufacturing problems where there is none.
https://api-docs.deepseek.com/guides/kv_cache
Openrouter says they might use your data for training.
If you read my post carefully, you will realize that I did not make any contradictory statements.
My wife is Chinese.
DeepSeek supports caching and cache hits are a tenth of the cost.
$0.028/M for cache hit
$0.28/M for cache miss
$0.42/M for output
— https://api-docs.deepseek.com/news/news250929
If they are okay for you, then sure go ahead. Enjoy the caching.
What other provider is going to support it?
Why?
They trained a thing to learn mimicking the full attention distribution but only filtering the top-k (k=2048) most important attention tokens so that when the context window increases, the compute does not go up linearly but constantly for the attention->[query,key] process (it does grow up linearly in the graph because you still need to roughly scan the entire context window (which an "indexer" will do), but just very roughly here in order to speed up things, which is O(L) here).