news

Granite 4.1

Granite 4.1 is IBM's open-weight LLM) family released in April 2026 under the Apache 2.0 License. It is a deliberate retreat from the MoE) direction taken by Granite 4.0, returning to a decoder-only dense transformer design with no expert routing, no sparse layers, and no extended reasoning chains.

Sebastien Dubois

01 May 2026 — 2 min read

Canonical version: Granite 4.1.

Granite 4.1 is IBM's open-weight LLM family released in April 2026 under the Apache 2.0 License. It is a deliberate retreat from the MoE direction taken by Granite 4.0, returning to a decoder-only dense transformer design with no expert routing, no sparse layers, and no extended reasoning chains. The headline claim is that the 8B dense model matches the prior 32B MoE predecessor.

Architecture

Decoder-only transformer, dense (every parameter active per token)
No MoE routing, no sparse layers, no built-in chain-of-thought / reasoning mode
All sizes share an identical training pipeline and data strategy

Model Lineup

Model	Params	Context Window
Granite 4.1 3B	3B	128K
Granite 4.1 8B	8B	512K
Granite 4.1 30B	30B	512K

Companion embedding models also released (311M and 97M variants) on Hugging Face.

Training

15 trillion tokens across five sequential phases
Phase 1: CommonCrawl 59%, code 20%, math 7%
Phase 2: shifts to math 35%, code 30%
Phases 3–4: chain-of-thought reasoning + instruction data
Phase 5: books 80%, code repositories 20% (context length extension)
Post-training: four sequential RL stages — domain-specific training, RLHF, identity calibration, and a dedicated math recovery stage added after RLHF was found to degrade math benchmarks

Benchmarks (8B)

ArenaHard: 69.0
BFCL V3 (tool calling): 68.3
GSM8K (math): 92.5
IFEval (instruction following): 87.1
EvalPlus (code): 80.2
DeepMind-Math: 80.1

Independent skepticism: community reports note Qwen 3.5 9B outperforming Granite 4.1 30B on several local-coding benchmarks, so the "8B matches 32B MoE" framing is internal and contested.

Deployment

Ollama
HuggingFace (ibm-granite collection), with FP8 quantized variants
vLLM, HuggingFace Transformers
IBM watsonx / IBM proprietary API

Why It Matters

Granite 4.1 is a counter-trend bet: while NVIDIA Nemotron, GLM-5.1, Mistral Large 3, and most 2026 frontier open-weights releases lean into MoE + reasoning modes, IBM doubles down on dense + no reasoning + extreme long-context (512K) for enterprise deployments. The wager is that predictability, easier quantization, and deterministic inference cost matter more than benchmark headlines for regulated/enterprise workloads — IBM's actual customer base.

References

About Sébastien

I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.

I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.

If you want to follow my work, then become a member and join our community.

Ready to get to the next level?

If you're tired of information overwhelm and ready to build a reliable knowledge system:

📚 KM for Beginners — 10+ hours of structured video lessons
🚀 Obsidian Starter Kit — Ready-made vault with 40+ templates
💼 Knowledge Worker Kit — Complete guides + lifetime community
🦉 1-on-1 Coaching — Personalized guidance
🎯 Join Knowii — Community + ALL courses & tools