Granite 4.1

Granite 4.1 is IBM's open-weight LLM) family released in April 2026 under the Apache 2.0 License. It is a deliberate retreat from the MoE) direction taken by Granite 4.0, returning to a decoder-only dense transformer design with no expert routing, no sparse layers, and no extended reasoning chains.

Canonical version: Granite 4.1.

Granite 4.1 is IBM's open-weight LLM family released in April 2026 under the Apache 2.0 License. It is a deliberate retreat from the MoE direction taken by Granite 4.0, returning to a decoder-only dense transformer design with no expert routing, no sparse layers, and no extended reasoning chains. The headline claim is that the 8B dense model matches the prior 32B MoE predecessor.

Architecture

  • Decoder-only transformer, dense (every parameter active per token)
  • No MoE routing, no sparse layers, no built-in chain-of-thought / reasoning mode
  • All sizes share an identical training pipeline and data strategy

Model Lineup

Model Params Context Window
Granite 4.1 3B 3B 128K
Granite 4.1 8B 8B 512K
Granite 4.1 30B 30B 512K

Companion embedding models also released (311M and 97M variants) on Hugging Face.

Training

  • 15 trillion tokens across five sequential phases
  • Phase 1: CommonCrawl 59%, code 20%, math 7%
  • Phase 2: shifts to math 35%, code 30%
  • Phases 3–4: chain-of-thought reasoning + instruction data
  • Phase 5: books 80%, code repositories 20% (context length extension)
  • Post-training: four sequential RL stages — domain-specific training, RLHF, identity calibration, and a dedicated math recovery stage added after RLHF was found to degrade math benchmarks

Benchmarks (8B)

  • ArenaHard: 69.0
  • BFCL V3 (tool calling): 68.3
  • GSM8K (math): 92.5
  • IFEval (instruction following): 87.1
  • EvalPlus (code): 80.2
  • DeepMind-Math: 80.1

Independent skepticism: community reports note Qwen 3.5 9B outperforming Granite 4.1 30B on several local-coding benchmarks, so the "8B matches 32B MoE" framing is internal and contested.

Deployment

  • Ollama
  • HuggingFace (ibm-granite collection), with FP8 quantized variants
  • vLLM, HuggingFace Transformers
  • IBM watsonx / IBM proprietary API

Why It Matters

Granite 4.1 is a counter-trend bet: while NVIDIA Nemotron, GLM-5.1, Mistral Large 3, and most 2026 frontier open-weights releases lean into MoE + reasoning modes, IBM doubles down on dense + no reasoning + extreme long-context (512K) for enterprise deployments. The wager is that predictability, easier quantization, and deterministic inference cost matter more than benchmark headlines for regulated/enterprise workloads — IBM's actual customer base.

References


About Sébastien

I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.

I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.

If you want to follow my work, then become a member and join our community.

Ready to get to the next level?

If you're tired of information overwhelm and ready to build a reliable knowledge system:

Found this valuable? Share it with someone who needs it.

Join 6,000+ readers. Get practical systems for knowledge & AI. Free.

Subscribe ✨

Free: Knowledge System Checklist

A clear roadmap to building your own knowledge system. Subscribe and get it straight to your inbox.

6,000+ readers. No spam. Unsubscribe anytime.