'western Qwen': Ibm Wows with Granite 4 LLM Launch and Hybrid Mamba/transformer

Posted3 months agoActive3 months ago

2bluesc

83 points

25 comments

venturebeat.comTechstory

excitedmixed

Debate

60/100

LLMIbmAI

Key topics

LLM

Ibm

IBM has launched Granite 4, a new LLM with a hybrid Mamba/Transformer architecture, sparking interest and discussion among the HN community about its performance, potential applications, and comparisons to other models.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

Peak period

2-4h

Avg / period

2.5

Comment distribution25 data points

Loading chart...

Based on 25 loaded comments

Key moments

01Story posted
Oct 3, 2025 at 12:26 AM EDT
3 months ago
Step 01
02First comment
Oct 3, 2025 at 1:58 AM EDT
2h after posting
Step 02
03Peak activity
5 comments in 2-4h
Hottest window of the conversation
Step 03
04Latest activity
Oct 4, 2025 at 7:59 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (25 comments)

Showing 25 comments

baobun

3 months ago

2 replies

IBM announcement post is more informative than venturebeat

IBM Granite 4.0: hyper-efficient, high performance hybrid models for enterprise

https://www.ibm.com/new/announcements/ibm-granite-4-0-hyper-...

flowerthoughts

3 months ago

1 reply

ISO 42001 certified.

> ISO/IEC 42001 is an international standard that specifies requirements for establishing, implementing, maintaining, and continually improving an Artificial Intelligence Management System (AIMS) within organizations. It is designed for entities providing or utilizing AI-based products or services, ensuring responsible development and use of AI systems.

https://www.iso.org/standard/42001

If anyone has access to ISO standards, I'm really curious what the practical effects of that certification is. I.e. what things does Granite have that others don't, because they had to add/do it to fulfill the certification.

The committee was formed in 2017, chaired by an AI expert: https://www.iso.org/committee/6794475.html

PeterStuer

3 months ago

Depends. In my experience, some countries, e.g. Spain, are very into certs while others just ignore it.

magicalhippo

3 months ago

They also have a nice write-up on the Mamba architecture:

https://www.ibm.com/think/topics/mamba-model

EagnaIonat

3 months ago

2 replies

Tried out the Ollama version and it's insanely fast with really good results for 1.9GB size. Supposed to have a 1M context window, would be interested where the speed goes then.

No Mamba in the Ollama version though.

Flere-Imsaho

3 months ago

1 reply

(I've only just starting running local LLMs so excuse the dumb question).

Would Granite run with llama.cpp and use Mamba?

RossBencina

3 months ago

1 reply

Last I checked Ollama inference is based on llama.cpp so either Ollama has not caught up yet, or the answer is no.

EDIT: Looks like Granite 4 hybrid architecture support was added to llama.cpp back in May: https://github.com/ggml-org/llama.cpp/pull/13550

magicalhippo

3 months ago

> Last I checked Ollama inference is based on llama.cpp

Yes and no. They've written their own "engine" using GGML libraries directly, but fall back to llama.cpp for models the new engine doesn't yet support.

mehdibl

3 months ago

Ollama default to Q4 usually and 8/16k context and not the 1M context

serioussecurity

3 months ago

1 reply

Every technical paper I've read that IBM publish at an ML conference has been P-hacked to hell. Stay away.

soganess

3 months ago

Links? Maybe just paper titles?

danielhanchen

3 months ago

2 replies

I made some dynamic GGUFs for the 32B MoE model! Try:

./llama.cpp/llama-cli -hf unsloth/granite-4.0-h-small-GGUF:UD-Q4_K_XL

Also a support agent finetuning notebook with granite 4: https://colab.research.google.com/github/unslothai/notebooks...

anshumankmr

3 months ago

1 reply

You guys are lightning fast. Did you folks have access to the model weights before hand or something, if you don't mind me asking?

danielhanchen

3 months ago

Oh thanks! Yes sometimes we get early access to some models!

incomingpain

3 months ago

1 reply

As always, you're awesome. keep up the great work!

danielhanchen

3 months ago

Thanks!

aetherspawn

3 months ago

1 reply

I really just want to know how it compares to ChatGPT and Claude at various tasks, but there aren’t any graphs for that.

KronisLV

3 months ago

1 reply

It will probably take a few days/week for some in depth benchmarks to start popping up.

The IBM article has this image showing that it's supposed to be a bit ahead of GPT OSS 120B for at least some tasks (horrible URL but oh well): https://www.ibm.com/content/dam/worldwide-content/creative-a...

So in general it's going to be worse than GPT-5 and also Sonnet 4.5, but closer to GPT-5 mini. At least you can run this on prem, but none of the others. Pretty good, could possibly replace Qwen3 for quite a few use cases!

KronisLV

3 months ago

Edit: or perhaps not, seems like 3rd party benchmarks aren't as positive.

anshumankmr

3 months ago

1 reply

Also worth checking out was codestral... I think that had a 256k context and used Mamba even if it is slightly older model now... it had worked great for a Text2SQL use case we worked on.

incomingpain

3 months ago

Magistral 2509 just came out. It super slows down when you go over 40,000 context. It's quite a fantastic model.

thawab

3 months ago

2 replies

After getting burned by Watson. I am not touching any AI from IBM.

arthurcolle

3 months ago

It's a file you can run on your computer

stirfish

3 months ago