news

GLM-5.2

GLM-5.2 is Z.ai's) flagship open-weight LLM), released as full open weights on June 16, 2026 (coding subscribers got access on June 13). It is the successor to GLM-5.1 and, at release, the leading open-weight model on the Artificial Analysis Intelligence Index. The headline change is a jump from a 2

Sebastien Dubois

23 Jun 2026 — 3 min read

Canonical version: GLM-5.2.

GLM-5.2 is Z.ai's flagship open-weight LLM, released as full open weights on June 16, 2026 (coding subscribers got access on June 13). It is the successor to GLM-5.1 and, at release, the leading open-weight model on the Artificial Analysis Intelligence Index. The headline change is a jump from a 200K to a 1M token Context Window with stable long-horizon performance.

Architecture

AI Mixture of Experts (MoE) architecture: ~753 billion total parameters, ~40 billion active per token
1M token Context Window (up from 200K on GLM-5.1)
Up to 131,072 token maximum output length
IndexShare: reuses the same indexer across every four sparse-attention layers, cutting per-token FLOPs by 2.9× at 1M context
Multiple thinking effort levels (e.g. High, Max) to trade capability against speed and cost
FP8 KV-cache quantization support
Text-only input (Z.ai's vision models are separate and not open-weight)

Note: Artificial Analysis reports 744B total / 40B active; Z.ai's own materials cite 753B. The active-parameter count (~40B) is consistent across sources.

Performance

Artificial Analysis Intelligence Index v4.1: 51 (#1 open-weight at release; ahead of MiniMax-M3 and DeepSeek V4 Pro at 44 each)
SWE-Bench Pro: 62.1 (up from GLM-5.1's 58.4; Claude Opus 4.8 ≈ 69.2)
Terminal-Bench 2.1: 81.0 (up from 63.5; Claude Opus 4.8 ≈ 85.0, GPT-5.5 ≈ 84)
FrontierSWE: 74.4 (up from 30.5; trails Opus 4.8 by ~1%, edges out GPT-5.5 by ~1%)
GDPval-AA v2: 1524 (ahead of MiniMax-M3 and DeepSeek V4 Pro; in line with proprietary models)
Reasoning: AIME 2026 99.2; GPQA-Diamond 91.2; HMMT Feb 2026 92.5; HLE 40% (+12); CritPt 21% (+16)
Agentic: MCP-Atlas 76.8; Tool-Decathlon 48.2
Ranked #2 on Code Arena WebDev, behind only Claude Fable 5

Note: benchmark figures are largely self-reported by Z.ai or sourced from Artificial Analysis as of June 2026.

Key Capabilities

Stable 1M-context support for large-scale implementation, automated research, performance optimization, and complex debugging
Strong long-horizon agentic coding, the biggest gains over GLM-5.1 are in sustained, multi-step tasks rather than one-shot generation
Function calling and reasoning support for tool-augmented agents
Built for Agentic Engineering across long-running development workflows

Token Efficiency

A notable drawback: GLM-5.2 burns ~43k output tokens per Intelligence Index task (up from ~26k on GLM-5.1), above MiniMax-M3 (~24k) and Kimi K2.6 (35k). It still lands on the Pareto frontier of intelligence vs cost per task ($0.46/task) thanks to cheap pricing.

Training Infrastructure

Trained with Z.ai's slime framework using parallel OPD (online policy distillation) training that merges 10+ expert models into the final model; OPD took ~2 days
Long-horizon RL via a critic-based PPO formulation learning from individual rollouts with trajectory compaction
Anti-Reward Hacking safeguards: rule-based filter plus LLM-based judgment during RL

Availability

MIT License: fully permissive, no regional limits
Weights on HuggingFace and ModelScope
Usable via Z.ai chat, ZCode desktop agent, Claude Code, and OpenCode
On Cloudflare Workers AI as @cf/zai-org/glm-5.2 (launched at 262,144 context, expandable toward the 1M max)
GLM Coding Plan quota: 3× peak / 2× off-peak; promotional 1× off-peak through end of September

Pricing

Z.ai first-party API: $1.40 input / $0.26 cache-hit / $4.40 output per 1M tokens
OpenRouter: ~$1.20 input / $4.10 output per 1M tokens
For comparison: GPT-5.5 ≈ $5/$30, Claude Opus ≈ $5/$25, GLM-5.2 is roughly 3–6× cheaper

Deployment Requirements

Full BF16 weights: ~1.51 TB
Q4_K_M (4-bit): ~476 GB, multi-GPU datacenter hardware (2× A100 80GB or 4× RTX 6000 Ada)
2-bit dynamic (Unsloth UD-IQ2_XXS): ~241 GB, runnable on a 256GB+ Mac Studio at 3–9 tokens/sec
1-bit dynamic: ~176 GB but quality degrades too far to be useful
Supported frameworks: transformers, vLLM, SGLang, xLLM, ktransformers
Practical reality: outside a ~$9,500 256GB+ Mac Studio (single-digit tokens/sec at 2-bit), this is a rent-or-API model, not a home-setup model

References

About Sébastien

I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.

I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.

If you want to follow my work, then become a member and join our community.

Ready to get to the next level?

If you're tired of information overwhelm and ready to build a reliable knowledge system:

📚 KM for Beginners — 10+ hours of structured video lessons
🚀 Obsidian Starter Kit — Ready-made vault with 40+ templates
💼 Knowledge Worker Kit — Complete guides + lifetime community
🦉 1-on-1 Coaching — Personalized guidance
🎯 Join Knowii — Community + ALL courses & tools