news

DeepSeek v4

Sebastien Dubois

01 May 2026 — 4 min read

Canonical version: DeepSeek v4.

Fourth-generation flagship release from Deepseek (April 24, 2026). Two open-weight variants — V4-Pro (1.6T total / 49B active parameters) and V4-Flash (284B total / 13B active) — both built on a MoE architecture, ship with a 1M-token Context Window by default, and fold what was the separate R reasoning line into a single model with switchable Thinking / Non-Thinking modes.

V4-Pro is the largest open weights model released to date.

What's actually new

DeepSeek Sparse Attention (DSA) + token-wise compression. The headline architectural innovation; a content-based variant of AI Sparse Attention. V4-Pro uses ~27% of the single-token FLOPs and ~10% of the KV cache size of DeepSeek V3.2 at the same context length. Against vanilla full attention the gap is far larger (early reader notes on the paper estimate ~1% of native attention FLOPs and KV size, with throughput improvements on the order of ~50× still to be independently validated). This is an efficiency-first release, not a scale-first one.
KV cache footprint that fits on commodity hardware. A full 1M-token context fits in roughly 5.7 GB of KV cache at FP8. For comparison, a Llama-3-405B-class native-attention model would need on the order of ~500 GB to hold the same context. That is what makes 1M-token inference economically real, not just paper-feasible; practitioners report running V4-Flash fully in GPU RAM at 1M context on setups that previously had to spill V3.2 into system memory at 256k.
Reasoning is no longer a separate model. The R series is folded into V4 (see AI Reasoning Models). Both Pro and Flash expose a reasoning_effort-style toggle.
Bitwise batch-invariant, deterministic kernels. Same input → same output across batch sizes. Most frontier labs trade reproducibility for throughput; DeepSeek deliberately doesn't.
API surface compatibility. Native support for both the OpenAI ChatCompletions and Anthropic API formats out of the box, lowering migration friction.

Pricing (per million tokens, input / output)

Model	Input	Output
DeepSeek V4-Flash	$0.14	$0.28
DeepSeek V4-Pro	$1.74	$3.48
Claude Opus (ref.)	$5	$25
GPT-5.5 (ref.)	$5	$30

V4-Pro is the cheapest of the larger frontier models by a wide margin; V4-Flash undercuts even OpenAI's cheapest tier. DeepSeek has signalled further reductions once Huawei Ascend deployment lands in mid-2026.

Performance positioning

V4-Pro rivals top closed-source frontier models and beats all current open models on Math / STEM / Coding benchmarks while preserving stronger world knowledge than other open releases. Independent assessments (Simon Willison, HN practitioners, PicoCreator's reading notes on the paper) consistently place it "between Sonnet and Opus" in feel; ~3–6 months behind absolute SOTA, close enough that the price gap dominates the decision in most agentic / batch workloads. V4-Flash's reasoning capability is reported to closely approach Pro for a fraction of the cost.

Token-economy caveat. The headline per-token price is the wrong number on its own. On the Artificial Analysis intelligence index, V4-Pro spends ~190M tokens to complete the suite (and Kimi K2.6 ~170M) versus ~45M for GPT-5.5 (high). The 5–15× per-token advantage shrinks (but does not disappear) once you account for verbosity on hard reasoning tasks; the cheaper-per-token model can occasionally cost roughly the same in dollars on the worst cases. The current discount on the official DeepSeek API also makes early comparisons rosier than the steady-state pricing will be; the open-weights release means alternative hosts (OpenRouter, Fireworks, etc.) can fill the gap when official capacity is throttled.

Why this matters

DeepSeek v4 is the clearest signal yet that the frontier is bifurcating along a cost / quality plane rather than a single capability axis. A 6-month-behind, 5-to-15× cheaper open model is the right tool for almost everything that isn't the absolute hardest reasoning step. The DSA + KV-cache reduction also makes ultra-long-context inference economically realistic, not just technically possible — the AI Inference cost curve just shifted.

Early practitioner reports back this up. A non-trivial TypeScript codebase audit (multi-file traversal, type analysis, refactor proposal across two prompts) ran end-to-end on V4-Pro for $0.09; the same task is reported to have cost on the order of $9–$13 on Claude Opus before recent price hikes. A full day of refactor work (many subagents, thousands of changed lines) totalled under $1. The cost ratio collapses on the workloads where verbosity bites (see token caveat above), but on the long tail of "good enough" engineering work it is roughly two orders of magnitude.

The real constraint, on day one, is operational: V4-Pro is hit hard with timeouts and rate limits at launch (including via OpenRouter at peak hours), so V4-Flash, or a third-party host, is the more reliable choice for iterative agent loops until capacity catches up.

References

Official announcement: https://api-docs.deepseek.com/news/news260424
Announcement post (X): https://x.com/deepseek_ai/status/2047516922263285776
Model collection: https://huggingface.co/collections/deepseek-ai/deepseek-v4
Technical report: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf
Simon Willison's writeup: https://simonwillison.net/2026/Apr/24/deepseek-v4/
PicoCreator's raw reading notes on the V4 paper (X): https://x.com/picocreator/status/2047625988125954386
Hacker News, launch-day discussions: https://news.ycombinator.com/item?id=47884971 and https://news.ycombinator.com/item?id=47885014
Hacker News, V4 in practice (cost, token economy, local deployment): https://news.ycombinator.com/item?id=47977026
Artificial Analysis pages: https://artificialanalysis.ai/models/deepseek-v4-pro and https://artificialanalysis.ai/models/deepseek-v4-flash

DeepSeek v4

Sebastien Dubois

What's actually new

Pricing (per million tokens, input / output)

Performance positioning

Why this matters

References

About Sébastien

Ready to get to the next level?

Free: Knowledge System Checklist

What's actually new

Pricing (per million tokens, input / output)

Performance positioning

Why this matters

References

Related

About Sébastien

Ready to get to the next level?

Free: Knowledge System Checklist