Baidu Releases Open-Source Multimodal AI That It Claims Beats GPT-5 and Gemini
Postedabout 2 months agoActiveabout 2 months ago
venturebeat.comTechstory
calmmixed
Debate
40/100
Artificial IntelligenceOpen-SourceMultimodal Models
Key topics
Artificial Intelligence
Open-Source
Multimodal Models
Baidu released an open-source multimodal AI model, ERNIE-4.5-VL-28B-A3B-Thinking, claiming it outperforms GPT-5 and Gemini, sparking discussion about its capabilities and the contributions of Chinese companies to open-source AI.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
4m
Peak period
4
0-1h
Avg / period
3
Key moments
- 01Story posted
Nov 12, 2025 at 11:21 AM EST
about 2 months ago
Step 01 - 02First comment
Nov 12, 2025 at 11:24 AM EST
4m after posting
Step 02 - 03Peak activity
4 comments in 0-1h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 12, 2025 at 12:29 PM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45902038Type: storyLast synced: 11/17/2025, 6:02:24 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
No way at so few parameters
The people who are spending billions on ai infra build outs want you to believe it's necessary, because frontier mega models are supposedly so much better. China has been showing us otherwise, especially being handicapped by export controls and showing how you can do more with less
It really hasn't. It's the opposite, actually. The latest breakthroughs in RL by the big4 labs haven't been replicated yet in any open model (including the latest k2-thinking). Even gemini-2.5 still delivers on generalisation in a way that no open models do, today (almost a year later). The general consensus was that "open" models were 6-8 months behind SotA, but with the RL stuff we can see they've moved further away.
I don't know what exactly it is, if it's simply RL scale, or data + scale, or better secret sauce (rewards, masking, something else) but the way these new models generalise is leagues ahead of open models, sadly.
Don't be fooled by benchmarks alone. You have to test them on problems that you own and you can be fairly sure no one is targeting for benchmark scores. Recently there was a python golfing competition on kaggle, and I tested some models on that task. While the top4 models were chugging along, in both agentic and 0shot regimes, the open models (coding specific or, older "thinking" models) were really bad at the task. 480b models, coding specific, would go in circles, get lost on one example, and so on. Night and day between the open models and gpt5/claude/gemini2.5. Even grok fast solved a lot of tasks in agentic mode.