news

Claude Opus 4.7

Sebastien Dubois

16 Apr 2026 — 7 min read

Claude Opus 4.7 is the Claude Opus model released by Anthropic on 2026-04-16. It was the frontier model for long-running agentic work until Claude Opus 4.8 (released 2026-05-27) succeeded it, with stronger instruction adherence, self-verification, and cross-session context handling than Opus 4.6.

Positioning

Designed for hard, multi-step engineering and research tasks you hand off rather than supervise line by line
Recommended usage shift: treat it like an engineer you delegate to, not a pair programmer you guide
More agentic; better at long-running work; carries context across sessions; handles ambiguity better than prior Opus
Still less capable than the internal Mythos Preview on some dimensions, but outperforms Opus 4.6 across most benchmarks

Availability

Model ID: claude-opus-4-7
Knowledge cutoff: January 2026
Previous version: Opus 4.6 (released 2026-02-05)
Surfaces: Claude.ai, Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry
Live in Claude Code from day one

Pricing

Input: $5 per million tokens
Output: $25 per million tokens
Same headline rates as Opus 4.6

Capabilities

Software engineering: improvements on complex, long-horizon tasks; strong instruction adherence; verifies its own outputs before reporting
Vision: accepts images up to 2,576 px on the long edge (~3.75 MP), over 3× the previous capacity; useful for screenshots, diagrams, UI analysis
Finance analysis, document reasoning, long-context performance: notable gains
Multimodal design taste: higher aesthetic quality for interfaces, slides, documentation
Slides specifically: external testers call it the strongest LLM they've used for PowerPoint generation, attributed to the vision upgrade
Writing on consulting-style content: characterized as the first Claude release that reads without filler ("no padding or hedging"); on interactive writing workflows it can feel slower and more regimented than 4.6
Reported benchmark gains (Anthropic + launch partners): CursorBench 58% → 70%, Rakuten SWE-Bench ~3× more resolved production tasks vs 4.6, SWE-bench Pro lifts on the hardest slices, LFG coding benchmark clears hardest tests when given detailed briefs

New controls and features

xhigh effort level: a new setting between high and max for reasoning/latency tradeoffs
Task budgets (public beta): guide token allocation per task
/ultrareview command in Claude Code: dedicated code-review session
Claude Code Auto Mode extended to Max users for autonomous decisions
Adaptive thinking now hides reasoning summaries by default; pass "display": "summarized" to restore them

System prompt changes (vs 4.6)

Based on Simon Willison's diff of the leaked 4.7 system prompt against 4.6 (+47 / -17 lines). The changes read as a concrete implementation of the control architecture Perez describes further down.

Branding: "developer platform" renamed to "Claude Platform"
New first-party apps referenced: Claude in Chrome (autonomous web interaction), Claude in Excel (spreadsheets), Claude in Powerpoint (slides). The prompt now expects the model to route to these surfaces where appropriate.
Safety tags and new guardrails:
- Child-safety block wrapped in a dedicated <critical_child_safety_instructions> tag (structural isolation rather than inline prose)
- Disordered-eating guidance: no specific nutrition or exercise numbers
- Screenshot-attack defense: refuse to be coerced into forced yes/no answers on contested topics via screenshots that fabricate prior context
Behavioral shifts:
- Less pushy; honors "end the conversation" without persuading or re-engaging
- More action-first: "when requests have minor unspecified details, people typically want Claude to make a reasonable attempt now, not be interviewed first"
- Must call tool_search before claiming a capability is unavailable (see Control architecture → Latent capability discovery)
- Default toward more concise responses
Removed:
- Directives about avoiding emotes and specific filler words like "genuinely" and "honestly"
- Stale Trump-era political references (obsolete given the January 2026 knowledge cutoff)
Tool surface: 23 tools referenced in the prompt, including bash_tool, web_search, web_fetch, conversation_search, image_search, weather_fetch, tool_search, visualization tools, and app integrations

Safety

Roughly similar safety profile to Opus 4.6
Improvements in honesty and resistance to Prompt injection
Cybersecurity capabilities intentionally reduced vs Mythos Preview; automated safeguards block high-risk requests
Cyber Verification Program available for legitimate security research

Migration notes

New tokenizer: Anthropic documents 1.0–1.35× more input tokens than 4.6. Measured real-world weighted ratio is 1.325×, with CLAUDE.md at 1.445× and technical docs at 1.47×. The top of the documented range is where most Claude Code content actually sits, not the middle; plan around the upper bound.
Ratio varies sharply by content type: CJK prose and emoji ~1.01×, CSV ~1.07×, JSON Schema tool definitions ~1.12×, dense JSON ~1.13×, English prose ~1.20×, code 1.29–1.39×, shell/TypeScript ~1.36–1.39×, technical docs up to 1.47×. Code and CLAUDE.md are hit hardest; structured data and non-Latin scripts barely move.
Chars-per-token drops accordingly: English 4.33 → 3.60; TypeScript 3.66 → 2.69. The vocabulary represents the same text in smaller pieces.
Per-session cost impact on typical Claude Code workloads: ~20–30% more per session (e.g., ~$6.65 → ~$7.86–$8.76 on an 80-turn session) despite unchanged headline rates.
Max plans: 5-hour rate-limit windows close proportionally faster on English- and code-heavy work.
Prompt cache is partitioned per model, so switching from 4.6 to 4.7 invalidates every cached prefix; the cold-start is more expensive because the prefix being written is 1.3–1.45× larger. Cache-bust events (CLAUDE.md edits, tool-list changes, compaction) pay the full ratio on the rewrite.
Same transcript yields a different token count on 4.7; baseline billing and observability will step-change the day you flip the model ID.
Higher effort levels produce more output tokens but improve reliability on hard problems
Worth re-benchmarking prompts and budgets when upgrading from 4.6

Reception and caveats

Community reports (Hacker News and X) flag:
- Perceived quality regressions on some tasks when adaptive thinking chooses not to think
- Frustration with hidden chain-of-thought and unclear communication about intentional vs buggy changes
- Hidden cost escalation from the tokenizer change combined with higher default effort
Measured tradeoff (independent IFEval test, N=20): +5pp on strict instruction-level adherence (86% → 90%), loose evaluation flat. Small, directionally consistent, not the "dramatic improvement" framing from launch partners. Smaller tokens forcing attention over individual words is Anthropic's stated mechanism, but the causal link cannot be isolated from concurrent weight and post-training changes.
Treat internal benchmarks skeptically; evaluate on your own workloads
"Big model smell" (Dan Shipper): awkward on first contact, breaks existing prompts, hidden depth that surfaces only once you adapt your workflow. First impressions undersell it
Loss of unprompted noticing: 4.6 would flag adjacent issues you didn't ask about (Brandon Gell example: 4.6 caught a P&L data error that was misclassifying a product as wildly unprofitable; 4.7, on the same input, returned a clean summary and missed it). The behavior that used to show up on its own now needs an explicit rail
Verdict split by use case: developers (Kieran Klaassen) describe 4.7 as "less showy, very good daily driver" once you go deep; writers (Katie Parrott) find it "too slow and regimented" for a daily writing driver. Treat the upgrade as task-dependent, not blanket

Working with it

Delegate whole tasks with clear acceptance criteria instead of micro-steering
Use xhigh or max when correctness matters more than latency
For code review, prefer /ultrareview over ad hoc review prompts
Lean on its cross-session context when resuming long-running work
Stop assuming inferred intent: 4.7 doesn't do prompt engineering on your behalf. Ask explicitly for the side-checks 4.6 used to volunteer (data sanity, edge cases, secondary risks) instead of expecting them as a bonus
Recalibrate prompts before declaring a regression. Many "4.7 got worse" reports come from prompts implicitly tuned to 4.6's inference habits; rewrite once with explicit acceptance criteria before judging
Match the model to the task: pair it with code, slides, finance/document analysis, and long-horizon work; keep 4.6 (or another model) in the loop for fast interactive writing where 4.7 feels regimented

Control architecture (from the leaked system prompt)

Carlos E. Perez's analysis of the leaked Opus 4.7 system prompt highlights that it reads less like a safety policy and more like a reusable control architecture; a set of prompt design primitives worth stealing.

Search-first epistemic gating: for present-day facts, the model must verify via web search before answering rather than trust its own knowledge. Turns search from an optional helper into a default posture for time-sensitive claims.
Latent capability discovery: the prompt explicitly teaches the model that the visible tool list is not the full tool list, and to look for deferred capabilities before declaring something unavailable. Shifts the default from "I don't have that" to capability-boundary skepticism.
- Mechanism: the tool_search tool (regex and BM25 variants, ids tool_search_tool_regex_20251119 / tool_search_tool_bm25_20251119) paired with defer_loading: true on tool definitions. Deferred tools are never injected into the system-prompt prefix; when tool_search returns a hit, a tool_reference block is appended inline and expanded into the full definition. Prefix stays stable, prompt cache stays valid.
- Scale: catalog up to 10,000 tools, 3–5 results per search, 200-char pattern cap. Anthropic reports ~85% reduction in tool-definition tokens and Opus-4 tool-selection accuracy jumping from 49% → 74% on large libraries.
- Implication: if the model says "I can't do X," it's a prompt-engineering hint to expose X via a deferred tool rather than a hard model limit. Design your own agent systems around this pattern for any toolset past ~30 tools.
Injection resistance over tool-first behavior: instructions found inside files, tool results, or other untrusted content do not override caution. Tool use is subordinate to the security boundary.
Non-submissive error repair: the model is told to own mistakes and fix them, but not to spiral into self-abasement or become more submissive just because it was corrected. Dignified self-correction as a design pattern.
Evenhanded advocacy on contentious topics: structured steelmanning with balance built in, explain the strongest case for each side rather than collapsing into false neutrality or picking a camp.

These primitives (epistemic humility, hidden-tool discovery, injection resistance, dignified self-correction, balanced steelmanning) are reusable outside Anthropic's context and worth encoding into your own agent system prompts.

References

Official announcement: https://www.anthropic.com/news/claude-opus-4-7
Hacker News discussion: https://news.ycombinator.com/item?id=47793411
Cat Wu on workflow shifts: https://x.com/_catwu/status/2044808533905178822
Boris Cherny announcement: https://x.com/bcherny/status/2044802532388774313
Boris Cherny on dogfooding 4.7: https://x.com/bcherny/status/2044847848035156457
Carlos E. Perez on the leaked system prompt as control architecture: https://x.com/IntuitMachine/status/2044888212280160659
Claude Code Camp — measured tokenizer cost and IFEval tradeoff: https://www.claudecodecamp.com/p/i-measured-claude-4-7-s-new-tokenizer-here-s-what-it-costs-you
Simon Willison on the 4.7 system prompt diff: https://simonwillison.net/2026/Apr/18/opus-system-prompt/
Raw diff (4.6 → 4.7): https://github.com/simonw/research/commit/888f21161500cd60b7c92367f9410e311ffcff09
Anthropic system-prompts archive: https://platform.claude.com/docs/en/release-notes/system-prompts
Tool search tool reference: https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool
Anthropic "Advanced tool use" (Nov 2025): https://www.anthropic.com/engineering/advanced-tool-use
Every — Vibe Check on Opus 4.7 (Katie Parrott; Shipper, Gell, Taylor, Klaassen): https://every.to/vibe-check/opus-4-7

Claude Opus 4.7

Sebastien Dubois

Positioning

Availability

Pricing

Capabilities

New controls and features

System prompt changes (vs 4.6)

Safety

Migration notes

Reception and caveats

Working with it

Control architecture (from the leaked system prompt)

References

About Sébastien

Ready to get to the next level?

Free: Knowledge System Checklist

Positioning

Availability

Pricing

Capabilities

New controls and features

System prompt changes (vs 4.6)

Safety

Migration notes

Reception and caveats

Working with it

Control architecture (from the leaked system prompt)

References

Related

About Sébastien

Ready to get to the next level?

Free: Knowledge System Checklist