Claude Opus 4.7
Claude Opus 4.7 is the Claude Opus model released by Anthropic on 2026-04-16. It was the frontier model for long-running agentic work until Claude Opus 4.8 (released 2026-05-27) succeeded it, with stronger instruction adherence, self-verification, and cross-session context handling than Opus 4.6.
Canonical version: Claude Opus 4.7.
Claude Opus 4.7 is the Claude Opus model released by Anthropic on 2026-04-16. It was the frontier model for long-running agentic work until Claude Opus 4.8 (released 2026-05-27) succeeded it, with stronger instruction adherence, self-verification, and cross-session context handling than Opus 4.6.
Positioning
- Designed for hard, multi-step engineering and research tasks you hand off rather than supervise line by line
- Recommended usage shift: treat it like an engineer you delegate to, not a pair programmer you guide
- More agentic; better at long-running work; carries context across sessions; handles ambiguity better than prior Opus
- Still less capable than the internal Mythos Preview on some dimensions, but outperforms Opus 4.6 across most benchmarks
Availability
- Model ID:
claude-opus-4-7 - Knowledge cutoff: January 2026
- Previous version: Opus 4.6 (released 2026-02-05)
- Surfaces: Claude.ai, Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry
- Live in Claude Code from day one
Pricing
- Input: $5 per million tokens
- Output: $25 per million tokens
- Same headline rates as Opus 4.6
Capabilities
- Software engineering: improvements on complex, long-horizon tasks; strong instruction adherence; verifies its own outputs before reporting
- Vision: accepts images up to 2,576 px on the long edge (~3.75 MP), over 3× the previous capacity; useful for screenshots, diagrams, UI analysis
- Finance analysis, document reasoning, long-context performance: notable gains
- Multimodal design taste: higher aesthetic quality for interfaces, slides, documentation
- Slides specifically: external testers call it the strongest LLM they've used for PowerPoint generation, attributed to the vision upgrade
- Writing on consulting-style content: characterized as the first Claude release that reads without filler ("no padding or hedging"); on interactive writing workflows it can feel slower and more regimented than 4.6
- Reported benchmark gains (Anthropic + launch partners): CursorBench 58% → 70%, Rakuten SWE-Bench ~3× more resolved production tasks vs 4.6, SWE-bench Pro lifts on the hardest slices, LFG coding benchmark clears hardest tests when given detailed briefs
New controls and features
xhigheffort level: a new setting betweenhighandmaxfor reasoning/latency tradeoffs- Task budgets (public beta): guide token allocation per task
/ultrareviewcommand in Claude Code: dedicated code-review session- Claude Code Auto Mode extended to Max users for autonomous decisions
- Adaptive thinking now hides reasoning summaries by default; pass
"display": "summarized"to restore them
System prompt changes (vs 4.6)
Based on Simon Willison's diff of the leaked 4.7 system prompt against 4.6 (+47 / -17 lines). The changes read as a concrete implementation of the control architecture Perez describes further down.
- Branding: "developer platform" renamed to "Claude Platform"
- New first-party apps referenced: Claude in Chrome (autonomous web interaction), Claude in Excel (spreadsheets), Claude in Powerpoint (slides). The prompt now expects the model to route to these surfaces where appropriate.
- Safety tags and new guardrails:
- Child-safety block wrapped in a dedicated
<critical_child_safety_instructions>tag (structural isolation rather than inline prose) - Disordered-eating guidance: no specific nutrition or exercise numbers
- Screenshot-attack defense: refuse to be coerced into forced yes/no answers on contested topics via screenshots that fabricate prior context
- Child-safety block wrapped in a dedicated
- Behavioral shifts:
- Less pushy; honors "end the conversation" without persuading or re-engaging
- More action-first: "when requests have minor unspecified details, people typically want Claude to make a reasonable attempt now, not be interviewed first"
- Must call
tool_searchbefore claiming a capability is unavailable (see Control architecture → Latent capability discovery) - Default toward more concise responses
- Removed:
- Directives about avoiding emotes and specific filler words like "genuinely" and "honestly"
- Stale Trump-era political references (obsolete given the January 2026 knowledge cutoff)
- Tool surface: 23 tools referenced in the prompt, including
bash_tool,web_search,web_fetch,conversation_search,image_search,weather_fetch,tool_search, visualization tools, and app integrations
Safety
- Roughly similar safety profile to Opus 4.6
- Improvements in honesty and resistance to Prompt injection
- Cybersecurity capabilities intentionally reduced vs Mythos Preview; automated safeguards block high-risk requests
- Cyber Verification Program available for legitimate security research
Migration notes
- New tokenizer: Anthropic documents 1.0–1.35× more input tokens than 4.6. Measured real-world weighted ratio is 1.325×, with CLAUDE.md at 1.445× and technical docs at 1.47×. The top of the documented range is where most Claude Code content actually sits, not the middle; plan around the upper bound.
- Ratio varies sharply by content type: CJK prose and emoji ~1.01×, CSV ~1.07×, JSON Schema tool definitions ~1.12×, dense JSON ~1.13×, English prose ~1.20×, code 1.29–1.39×, shell/TypeScript ~1.36–1.39×, technical docs up to 1.47×. Code and CLAUDE.md are hit hardest; structured data and non-Latin scripts barely move.
- Chars-per-token drops accordingly: English 4.33 → 3.60; TypeScript 3.66 → 2.69. The vocabulary represents the same text in smaller pieces.
- Per-session cost impact on typical Claude Code workloads: ~20–30% more per session (e.g., ~$6.65 → ~$7.86–$8.76 on an 80-turn session) despite unchanged headline rates.
- Max plans: 5-hour rate-limit windows close proportionally faster on English- and code-heavy work.
- Prompt cache is partitioned per model, so switching from 4.6 to 4.7 invalidates every cached prefix; the cold-start is more expensive because the prefix being written is 1.3–1.45× larger. Cache-bust events (CLAUDE.md edits, tool-list changes, compaction) pay the full ratio on the rewrite.
- Same transcript yields a different token count on 4.7; baseline billing and observability will step-change the day you flip the model ID.
- Higher effort levels produce more output tokens but improve reliability on hard problems
- Worth re-benchmarking prompts and budgets when upgrading from 4.6
Reception and caveats
- Community reports (Hacker News and X) flag:
- Perceived quality regressions on some tasks when adaptive thinking chooses not to think
- Frustration with hidden chain-of-thought and unclear communication about intentional vs buggy changes
- Hidden cost escalation from the tokenizer change combined with higher default effort
- Measured tradeoff (independent IFEval test, N=20): +5pp on strict instruction-level adherence (86% → 90%), loose evaluation flat. Small, directionally consistent, not the "dramatic improvement" framing from launch partners. Smaller tokens forcing attention over individual words is Anthropic's stated mechanism, but the causal link cannot be isolated from concurrent weight and post-training changes.
- Treat internal benchmarks skeptically; evaluate on your own workloads
- "Big model smell" (Dan Shipper): awkward on first contact, breaks existing prompts, hidden depth that surfaces only once you adapt your workflow. First impressions undersell it
- Loss of unprompted noticing: 4.6 would flag adjacent issues you didn't ask about (Brandon Gell example: 4.6 caught a P&L data error that was misclassifying a product as wildly unprofitable; 4.7, on the same input, returned a clean summary and missed it). The behavior that used to show up on its own now needs an explicit rail
- Verdict split by use case: developers (Kieran Klaassen) describe 4.7 as "less showy, very good daily driver" once you go deep; writers (Katie Parrott) find it "too slow and regimented" for a daily writing driver. Treat the upgrade as task-dependent, not blanket
Working with it
- Delegate whole tasks with clear acceptance criteria instead of micro-steering
- Use
xhighormaxwhen correctness matters more than latency - For code review, prefer
/ultrareviewover ad hoc review prompts - Lean on its cross-session context when resuming long-running work
- Stop assuming inferred intent: 4.7 doesn't do prompt engineering on your behalf. Ask explicitly for the side-checks 4.6 used to volunteer (data sanity, edge cases, secondary risks) instead of expecting them as a bonus
- Recalibrate prompts before declaring a regression. Many "4.7 got worse" reports come from prompts implicitly tuned to 4.6's inference habits; rewrite once with explicit acceptance criteria before judging
- Match the model to the task: pair it with code, slides, finance/document analysis, and long-horizon work; keep 4.6 (or another model) in the loop for fast interactive writing where 4.7 feels regimented
Control architecture (from the leaked system prompt)
Carlos E. Perez's analysis of the leaked Opus 4.7 system prompt highlights that it reads less like a safety policy and more like a reusable control architecture; a set of prompt design primitives worth stealing.
- Search-first epistemic gating: for present-day facts, the model must verify via web search before answering rather than trust its own knowledge. Turns search from an optional helper into a default posture for time-sensitive claims.
- Latent capability discovery: the prompt explicitly teaches the model that the visible tool list is not the full tool list, and to look for deferred capabilities before declaring something unavailable. Shifts the default from "I don't have that" to capability-boundary skepticism.
- Mechanism: the
tool_searchtool (regex and BM25 variants, idstool_search_tool_regex_20251119/tool_search_tool_bm25_20251119) paired withdefer_loading: trueon tool definitions. Deferred tools are never injected into the system-prompt prefix; whentool_searchreturns a hit, atool_referenceblock is appended inline and expanded into the full definition. Prefix stays stable, prompt cache stays valid. - Scale: catalog up to 10,000 tools, 3–5 results per search, 200-char pattern cap. Anthropic reports ~85% reduction in tool-definition tokens and Opus-4 tool-selection accuracy jumping from 49% → 74% on large libraries.
- Implication: if the model says "I can't do X," it's a prompt-engineering hint to expose X via a deferred tool rather than a hard model limit. Design your own agent systems around this pattern for any toolset past ~30 tools.
- Mechanism: the
- Injection resistance over tool-first behavior: instructions found inside files, tool results, or other untrusted content do not override caution. Tool use is subordinate to the security boundary.
- Non-submissive error repair: the model is told to own mistakes and fix them, but not to spiral into self-abasement or become more submissive just because it was corrected. Dignified self-correction as a design pattern.
- Evenhanded advocacy on contentious topics: structured steelmanning with balance built in, explain the strongest case for each side rather than collapsing into false neutrality or picking a camp.
These primitives (epistemic humility, hidden-tool discovery, injection resistance, dignified self-correction, balanced steelmanning) are reusable outside Anthropic's context and worth encoding into your own agent system prompts.
References
- Official announcement: https://www.anthropic.com/news/claude-opus-4-7
- Hacker News discussion: https://news.ycombinator.com/item?id=47793411
- Cat Wu on workflow shifts: https://x.com/_catwu/status/2044808533905178822
- Boris Cherny announcement: https://x.com/bcherny/status/2044802532388774313
- Boris Cherny on dogfooding 4.7: https://x.com/bcherny/status/2044847848035156457
- Carlos E. Perez on the leaked system prompt as control architecture: https://x.com/IntuitMachine/status/2044888212280160659
- Claude Code Camp — measured tokenizer cost and IFEval tradeoff: https://www.claudecodecamp.com/p/i-measured-claude-4-7-s-new-tokenizer-here-s-what-it-costs-you
- Simon Willison on the 4.7 system prompt diff: https://simonwillison.net/2026/Apr/18/opus-system-prompt/
- Raw diff (4.6 → 4.7): https://github.com/simonw/research/commit/888f21161500cd60b7c92367f9410e311ffcff09
- Anthropic system-prompts archive: https://platform.claude.com/docs/en/release-notes/system-prompts
- Tool search tool reference: https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool
- Anthropic "Advanced tool use" (Nov 2025): https://www.anthropic.com/engineering/advanced-tool-use
- Every — Vibe Check on Opus 4.7 (Katie Parrott; Shipper, Gell, Taylor, Klaassen): https://every.to/vibe-check/opus-4-7
Related
- Claude
- Anthropic
- Claude Code
- Claude Opus 4.8
- Claude Dynamic Workflows
- Claude Code Auto Mode
- Claude Design
- Large Language Models (LLMs)
- Context Window
- Model routing
About Sébastien
I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.
I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.
If you want to follow my work, then become a member and join our community.
Ready to get to the next level?
If you're tired of information overwhelm and ready to build a reliable knowledge system:
- 📚 KM for Beginners — 10+ hours of structured video lessons
- 🚀 Obsidian Starter Kit — Ready-made vault with 40+ templates
- 💼 Knowledge Worker Kit — Complete guides + lifetime community
- 🦉 1-on-1 Coaching — Personalized guidance
- 🎯 Join Knowii — Community + ALL courses & tools
Found this valuable? Share it with someone who needs it.