Kimi K2.6, Qwen, and Gemma 4: Local AI Is Catching Up
April 2026 has been a big month for open-weight AI, and it's not over yet. Three releases landed back-to-back, and together they show that the AI models you can run yourself are getting MUCH more capable, MUCH faster than most people expected.
Canonical version: Kimi K2.6, Qwen, and Gemma 4: Local AI Is Catching Up.
April 2026 has been a big month for open-weight AI, and it's not over yet. Three releases landed back-to-back, and together they show that the AI models you can run yourself are getting MUCH more capable, MUCH faster than most people expected.
Kimi K2.6 from Moonshot AI
Moonshot AI (one of the four "AI Tigers" of China, alongside Deepseek, Zhipu AI (Z.ai), and Baichuan) just released Kimi K2.6. It is a 1-trillion-parameter Mixture-of-Experts model (with around 32B active per token), successor to Kimi K2.5, and it is genuinely competitive with Claude Opus 4.7, GPT-5.4, and Gemini 3 on agentic coding benchmarks.
The interesting part is that K2.6 sustained 4,000+ tool calls over 12+ hours in Moonshot's own demos, and scales to 300 sub-agents running 4,000 coordinated steps (i.e., agent swarms). That is not "writes a function"; that is "runs a small engineering team for a shift."
It already plugs into OpenClaw, Hermes, and similar agent harnesses, and Moonshot ships a dedicated Kimi Code coding surface. You can get it and use it today.
Official announcement: https://www.kimi.com/blog/kimi-k2-6
Qwen3.6-35B-A3B from Alibaba
In the same window, Qwen shipped Qwen3.6-35B-A3B: a 35B total / ~3B active MoE in the same naming family. Quantized to Q4_K_S, it sits around 20.9 GB and runs comfortably on a consumer MacBook Pro via LM Studio or Ollama.
The surprise in the community was Simon Willison's "pelican on a bicycle" test: Qwen3.6-35B-A3B produced a more recognizable drawing than Claude Opus 4.7 on multiple attempts. Simon himself cautions not to over-read one absurd benchmark, but it's still interesting to mention ;-)
Qwen3.6 keeps alive the pattern of an open-weight release every few weeks that pulls a few more tasks onto local hardware.
Official announcement: https://qwen.ai/blog?id=qwen3.6-35b-a3b
Gemma 4 runs on phones
The third release: Gemma 4 from Google DeepMind (April 2, 2026). Four variants: E2B, E4B, 26B A4B, and 31B. The E2B/E4B variants are built specifically for on-device use, with Per-Layer Embeddings and multimodal support (text, image, audio) in a model that fits comfortably on a modern phone. Google ships them via the Google AI Edge Gallery, a mobile app that runs Gemma models directly on your phone's hardware.
This is a big leap forward for edge devices. An on-device LLM with built-in reasoning, tool calling, a native system role, and 128K context turns a phone into a self-contained AI workstation. No API key, no network dependency, no data leaving the device. Go into Airplane Mode, and you're still able to use AI.
Note that Gemma 4's small variants are competitive on reasoning and code benchmarks that would have been considered frontier-only two years ago.
Official announcement: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/
The pattern
Put these three together and the direction is obvious:
- Kimi K2.6 pushes the ceiling of what open-weight agentic coding can do, given enough hardware (AI Expert Offloading gets 1T MoE models running on a 96GB Mac).
- Qwen3.6 pushes the floor of what runs on a consumer laptop.
- Gemma 4 pushes the floor of what runs on a phone you already carry.
Local models will not match the frontier closed-weight models. That gap will continue to exist for the foreseeable future. Anthropic, OpenAI, and Google DeepMind will continue to maintain a ~6-to-12-month lead on the absolute top end. The interesting fact is not the ceiling, it is what moves below it.
Every quarter, more real tasks cross the line from "needs a frontier AI model" to "runs fine on my machine":
- Last year that was summarization, translation, simple chat.
- This year it is multi-hour agentic coding, long-context document reasoning, multimodal understanding on a phone.
- Next year it will be something we currently think we need Opus for.
If you are leveraging AI, treat the local tier as a first-class option, not only as a fallback. Local models can now do a ton of things. For tasks that do not strictly need the frontier, local models are much cheaper (you pay for electricity/compute), more private, and increasingly, good enough. And "good enough" keeps getting better.
Related
- Kimi K2.6
- Qwen3.6-35B-A3B
- Gemma 4
- Moonshot AI
- Qwen
- Google DeepMind
- Claude Opus 4.7
- GPT-5.4
- Gemini 3
- AI Open Weight Models
- AI Expert Offloading
- Running AI Models Locally
- AI Mixture of Experts (MoE)
- Large Language Models (LLMs)
- Ollama
- LM Studio
- Simon Willison
About Sébastien
I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.
I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.
If you want to follow my work, then become a member and join our community.
Ready to get to the next level?
If you're tired of information overwhelm and ready to build a reliable knowledge system:
- 📚 KM for Beginners — 10+ hours of structured video lessons
- 🚀 Obsidian Starter Kit — Ready-made vault with 40+ templates
- 💼 Knowledge Worker Kit — Complete guides + lifetime community
- 🦉 1-on-1 Coaching — Personalized guidance
- 🎯 Join Knowii — Community + ALL courses & tools
Found this valuable? Share it with someone who needs it.