DiffusionGemma
DiffusionGemma is an experimental open model from Google DeepMind in the Gemma family that generates text by diffusion instead of autoregression. Rather than predicting one token at a time, it denoises whole spans in parallel, built on Gemma 4 and Gemini Diffusion research. The payoff is speed: up t
Canonical version: DiffusionGemma.
DiffusionGemma is an experimental open model from Google DeepMind in the Gemma family that generates text by diffusion instead of autoregression. Rather than predicting one token at a time, it denoises whole spans in parallel, built on Gemma 4 and Gemini Diffusion research. The payoff is speed: up to ~4× faster output, exceeding 1,000 tokens/second on a single NVIDIA H100.
How diffusion text generation differs
Standard Large Language Models (LLMs) are autoregressive: strictly sequential, token-by-token, bottlenecked on memory bandwidth. DiffusionGemma uses discrete text diffusion with bi-directional attention to generate many tokens per forward pass (a "canvas" of 256), then iteratively refines them, enabling self-correction and better global consistency, and shifting the bottleneck from memory bandwidth to raw compute.
Architecture
- ~26B total parameters (25.2B), 3.8B active: AI Mixture of Experts (MoE) (8 of 128 experts active + 1 shared)
- Encoder-decoder: an autoregressive encoder caches the prompt context, paired with a diffusion decoder
- Up to 256K token Context Window; 262K vocabulary; sliding window 1024
- 15–20 tokens generated per forward pass (>1100 tok/s on H100 in FP8)
- Multimodal input (text, image, video → text); ~550M vision params; built-in thinking mode
- Supports NVIDIA NVFP4 (4-bit float) on Blackwell GPUs
Performance (instruction-tuned)
- MMLU Pro: 77.6% · GPQA Diamond: 73.2% · LiveCodeBench v6: 69.1% · MATH-Vision: 70.5%
Availability
- Apache 2.0 License; an open-weight release
- On HuggingFace (
google/diffusiongemma-26B-A4B-it), Kaggle, and Google Vertex AI Model Garden
References
- https://deepmind.google/models/gemma/diffusiongemma/
- https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/
- https://huggingface.co/google/diffusiongemma-26B-A4B-it
- https://ai.google.dev/gemma/docs/diffusiongemma
- https://developers.googleblog.com/diffusiongemma-the-developer-guide/
- https://www.xda-developers.com/tried-google-diffusiongemma-generate-text-like-image-local-llm/
- https://vllm.ai/blog/2026-06-10-diffusion-gemma
Related
- Gemma
- Google DeepMind
- Gemini
- Diffusion Models
- Large Language Models (LLMs)
- AI Mixture of Experts (MoE)
- AI Open Weight Models
- Context Window
- Apache 2.0 License
About Sébastien
I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.
I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.
If you want to follow my work, then become a member and join our community.
Ready to get to the next level?
If you're tired of information overwhelm and ready to build a reliable knowledge system:
- 📚 KM for Beginners — 10+ hours of structured video lessons
- 🚀 Obsidian Starter Kit — Ready-made vault with 40+ templates
- 💼 Knowledge Worker Kit — Complete guides + lifetime community
- 🦉 1-on-1 Coaching — Personalized guidance
- 🎯 Join Knowii — Community + ALL courses & tools
Found this valuable? Share it with someone who needs it.