GLM-5.2

GLM-5.2 is Z.ai's) flagship open-weight LLM), released as full open weights on June 16, 2026 (coding subscribers got access on June 13). It is the successor to GLM-5.1 and, at release, the leading open-weight model on the Artificial Analysis Intelligence Index. The headline change is a jump from a 2

Canonical version: GLM-5.2.

GLM-5.2 is Z.ai's flagship open-weight LLM, released as full open weights on June 16, 2026 (coding subscribers got access on June 13). It is the successor to GLM-5.1 and, at release, the leading open-weight model on the Artificial Analysis Intelligence Index. The headline change is a jump from a 200K to a 1M token Context Window with stable long-horizon performance.

Architecture

  • AI Mixture of Experts (MoE) architecture: ~753 billion total parameters, ~40 billion active per token
  • 1M token Context Window (up from 200K on GLM-5.1)
  • Up to 131,072 token maximum output length
  • IndexShare: reuses the same indexer across every four sparse-attention layers, cutting per-token FLOPs by 2.9× at 1M context
  • Multiple thinking effort levels (e.g. High, Max) to trade capability against speed and cost
  • FP8 KV-cache quantization support
  • Text-only input (Z.ai's vision models are separate and not open-weight)

Note: Artificial Analysis reports 744B total / 40B active; Z.ai's own materials cite 753B. The active-parameter count (~40B) is consistent across sources.

Performance

  • Artificial Analysis Intelligence Index v4.1: 51 (#1 open-weight at release; ahead of MiniMax-M3 and DeepSeek V4 Pro at 44 each)
  • SWE-Bench Pro: 62.1 (up from GLM-5.1's 58.4; Claude Opus 4.8 ≈ 69.2)
  • Terminal-Bench 2.1: 81.0 (up from 63.5; Claude Opus 4.8 ≈ 85.0, GPT-5.5 ≈ 84)
  • FrontierSWE: 74.4 (up from 30.5; trails Opus 4.8 by ~1%, edges out GPT-5.5 by ~1%)
  • GDPval-AA v2: 1524 (ahead of MiniMax-M3 and DeepSeek V4 Pro; in line with proprietary models)
  • Reasoning: AIME 2026 99.2; GPQA-Diamond 91.2; HMMT Feb 2026 92.5; HLE 40% (+12); CritPt 21% (+16)
  • Agentic: MCP-Atlas 76.8; Tool-Decathlon 48.2
  • Ranked #2 on Code Arena WebDev, behind only Claude Fable 5

Note: benchmark figures are largely self-reported by Z.ai or sourced from Artificial Analysis as of June 2026.

Key Capabilities

  • Stable 1M-context support for large-scale implementation, automated research, performance optimization, and complex debugging
  • Strong long-horizon agentic coding, the biggest gains over GLM-5.1 are in sustained, multi-step tasks rather than one-shot generation
  • Function calling and reasoning support for tool-augmented agents
  • Built for Agentic Engineering across long-running development workflows

Token Efficiency

A notable drawback: GLM-5.2 burns ~43k output tokens per Intelligence Index task (up from ~26k on GLM-5.1), above MiniMax-M3 (~24k) and Kimi K2.6 (35k). It still lands on the Pareto frontier of intelligence vs cost per task ($0.46/task) thanks to cheap pricing.

Training Infrastructure

  • Trained with Z.ai's slime framework using parallel OPD (online policy distillation) training that merges 10+ expert models into the final model; OPD took ~2 days
  • Long-horizon RL via a critic-based PPO formulation learning from individual rollouts with trajectory compaction
  • Anti-Reward Hacking safeguards: rule-based filter plus LLM-based judgment during RL

Availability

  • MIT License: fully permissive, no regional limits
  • Weights on HuggingFace and ModelScope
  • Usable via Z.ai chat, ZCode desktop agent, Claude Code, and OpenCode
  • On Cloudflare Workers AI as @cf/zai-org/glm-5.2 (launched at 262,144 context, expandable toward the 1M max)
  • GLM Coding Plan quota: 3× peak / 2× off-peak; promotional 1× off-peak through end of September

Pricing

  • Z.ai first-party API: $1.40 input / $0.26 cache-hit / $4.40 output per 1M tokens
  • OpenRouter: ~$1.20 input / $4.10 output per 1M tokens
  • For comparison: GPT-5.5 ≈ $5/$30, Claude Opus ≈ $5/$25, GLM-5.2 is roughly 3–6× cheaper

Deployment Requirements

  • Full BF16 weights: ~1.51 TB
  • Q4_K_M (4-bit): ~476 GB, multi-GPU datacenter hardware (2× A100 80GB or 4× RTX 6000 Ada)
  • 2-bit dynamic (Unsloth UD-IQ2_XXS): ~241 GB, runnable on a 256GB+ Mac Studio at 3–9 tokens/sec
  • 1-bit dynamic: ~176 GB but quality degrades too far to be useful
  • Supported frameworks: transformers, vLLM, SGLang, xLLM, ktransformers
  • Practical reality: outside a ~$9,500 256GB+ Mac Studio (single-digit tokens/sec at 2-bit), this is a rent-or-API model, not a home-setup model

References


About Sébastien

I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.

I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.

If you want to follow my work, then become a member and join our community.

Ready to get to the next level?

If you're tired of information overwhelm and ready to build a reliable knowledge system:

Found this valuable? Share it with someone who needs it.

Join 6,000+ readers. Get practical systems for knowledge & AI. Free.

Subscribe ✨

Free: Knowledge System Checklist

A clear roadmap to building your own knowledge system. Subscribe and get it straight to your inbox.

6,000+ readers. No spam. Unsubscribe anytime.