news

Kimi K2.6, Qwen, and Gemma 4: Local AI Is Catching Up

Sebastien Dubois

21 Apr 2026 — 3 min read

Canonical version: Kimi K2.6, Qwen, and Gemma 4: Local AI Is Catching Up.

April 2026 has been a big month for open-weight AI, and it's not over yet. Three releases landed back-to-back, and together they show that the AI models you can run yourself are getting MUCH more capable, MUCH faster than most people expected.

Kimi K2.6 from Moonshot AI

Moonshot AI (one of the four "AI Tigers" of China, alongside Deepseek, Zhipu AI (Z.ai), and Baichuan) just released Kimi K2.6. It is a 1-trillion-parameter Mixture-of-Experts model (with around 32B active per token), successor to Kimi K2.5, and it is genuinely competitive with Claude Opus 4.7, GPT-5.4, and Gemini 3 on agentic coding benchmarks.

The interesting part is that K2.6 sustained 4,000+ tool calls over 12+ hours in Moonshot's own demos, and scales to 300 sub-agents running 4,000 coordinated steps (i.e., agent swarms). That is not "writes a function"; that is "runs a small engineering team for a shift."

It already plugs into OpenClaw, Hermes, and similar agent harnesses, and Moonshot ships a dedicated Kimi Code coding surface. You can get it and use it today.

Official announcement: https://www.kimi.com/blog/kimi-k2-6

Qwen3.6-35B-A3B from Alibaba

In the same window, Qwen shipped Qwen3.6-35B-A3B: a 35B total / ~3B active MoE in the same naming family. Quantized to Q4_K_S, it sits around 20.9 GB and runs comfortably on a consumer MacBook Pro via LM Studio or Ollama.

The surprise in the community was Simon Willison's "pelican on a bicycle" test: Qwen3.6-35B-A3B produced a more recognizable drawing than Claude Opus 4.7 on multiple attempts. Simon himself cautions not to over-read one absurd benchmark, but it's still interesting to mention ;-)

Qwen3.6 keeps alive the pattern of an open-weight release every few weeks that pulls a few more tasks onto local hardware.

Official announcement: https://qwen.ai/blog?id=qwen3.6-35b-a3b

Gemma 4 runs on phones

The third release: Gemma 4 from Google DeepMind (April 2, 2026). Four variants: E2B, E4B, 26B A4B, and 31B. The E2B/E4B variants are built specifically for on-device use, with Per-Layer Embeddings and multimodal support (text, image, audio) in a model that fits comfortably on a modern phone. Google ships them via the Google AI Edge Gallery, a mobile app that runs Gemma models directly on your phone's hardware.

This is a big leap forward for edge devices. An on-device LLM with built-in reasoning, tool calling, a native system role, and 128K context turns a phone into a self-contained AI workstation. No API key, no network dependency, no data leaving the device. Go into Airplane Mode, and you're still able to use AI.

Note that Gemma 4's small variants are competitive on reasoning and code benchmarks that would have been considered frontier-only two years ago.

Official announcement: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/

The pattern

Put these three together and the direction is obvious:

Kimi K2.6 pushes the ceiling of what open-weight agentic coding can do, given enough hardware (AI Expert Offloading gets 1T MoE models running on a 96GB Mac).
Qwen3.6 pushes the floor of what runs on a consumer laptop.
Gemma 4 pushes the floor of what runs on a phone you already carry.

Local models will not match the frontier closed-weight models. That gap will continue to exist for the foreseeable future. Anthropic, OpenAI, and Google DeepMind will continue to maintain a ~6-to-12-month lead on the absolute top end. The interesting fact is not the ceiling, it is what moves below it.

Every quarter, more real tasks cross the line from "needs a frontier AI model" to "runs fine on my machine":

Last year that was summarization, translation, simple chat.
This year it is multi-hour agentic coding, long-context document reasoning, multimodal understanding on a phone.
Next year it will be something we currently think we need Opus for.

If you are leveraging AI, treat the local tier as a first-class option, not only as a fallback. Local models can now do a ton of things. For tasks that do not strictly need the frontier, local models are much cheaper (you pay for electricity/compute), more private, and increasingly, good enough. And "good enough" keeps getting better.

Kimi K2.6, Qwen, and Gemma 4: Local AI Is Catching Up

Sebastien Dubois

Kimi K2.6 from Moonshot AI

Qwen3.6-35B-A3B from Alibaba

Gemma 4 runs on phones

The pattern

About Sébastien

Ready to get to the next level?

Free: Knowledge System Checklist

Kimi K2.6 from Moonshot AI

Qwen3.6-35B-A3B from Alibaba

Gemma 4 runs on phones

The pattern

Related

About Sébastien

Ready to get to the next level?

Free: Knowledge System Checklist