Claude Opus 4.7 landed on April 16, 2026 as Anthropic's most capable generally available model, with sharper coding, better vision, and tighter instruction following. It also eats more tokens per task than Opus 4.6, and that shift is reshaping how people budget their subscriptions and API spend.
Why token usage went up
Two changes drive the higher burn rate. First, Opus 4.7 ships with an updated tokenizer, which changes how raw text gets chopped into tokens before the model sees it. The same prompt can now map to roughly 1.0 to 1.35 times the token count it would have used on Opus 4.6, depending on content type. Code-heavy and structured inputs tend to land on the higher end.
Second, the model thinks more at higher effort levels, especially on later turns in agentic sessions. That produces more output tokens on hard problems, which is part of why reliability improved on long-running tasks but also why sessions end faster on fixed-quota plans.
Pricing and API costs
Per-token pricing matches Opus 4.6 exactly, so nothing changed at the rate card level. The cost shift comes entirely from token volume per task.
| Metric | Opus 4.7 |
|---|---|
| Input tokens | $5 per million |
| Output tokens | $25 per million |
| Tokenizer change vs 4.6 | 1.0x to 1.35x input tokens |
| Prompt caching savings | Up to 90% |
| Batch processing savings | 50% |
| US-only inference | 1.1x base pricing |
| Model slug | claude-opus-4-7 |
The practical effect: a task that cost $1 on Opus 4.6 can now cost $1 to $1.35 on the input side before accounting for extra thinking tokens on the output side. Anthropic recommends measuring the real difference on your own traffic rather than assuming a flat multiplier.
Effort levels and the new xhigh tier
Effort levels control how much the model reasons before answering. Opus 4.7 adds a new xhigh tier that sits between high and max. In Claude Code, the default jumped to xhigh for every plan, which means users on Pro and Max subscriptions will see quota drain faster than they did on Opus 4.6 unless they change settings.
| Effort level | Relative thinking | Best for |
|---|---|---|
| low | Minimal | Short edits, simple questions |
| medium | Moderate | Standard coding and analysis |
| high | Extended | Complex refactors, multi-step plans |
| xhigh | Longer sustained reasoning | Hard agentic coding (new default in Claude Code) |
| max | Highest | The hardest problems, slowest responses |
One partner noted that low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6, so you can often step down a tier and still come out ahead on quality. Anthropic recommends starting at high or xhigh for coding and agentic work, then adjusting based on results.
Ways to control token spend
There are four practical levers for keeping usage in check on Opus 4.7.
Step 1: Drop the effort parameter. If a task completes well at medium or high, don't leave it at the new xhigh default. This is the single biggest lever for reducing output tokens.
Step 2: Use task budgets, now available in public beta on the API. Task budgets let you cap how many tokens the model can spend across a long run, so it prioritizes the important work instead of burning the whole budget early.
Step 3: Turn on prompt caching for repeated system prompts and context. Caching cuts input token costs by up to 90% on cached segments, which is significant for agents that reuse the same instructions across many turns.
Step 4: Use batch processing for non-urgent jobs. Batch cuts pricing in half for both input and output tokens, and it works well for bulk code reviews, document analysis, and scheduled runs.
When to use Opus 4.7 vs Opus 4.6
Opus 4.7 is a clear upgrade for coding agents, tool-heavy workflows, and anything involving dense screenshots or diagrams. It leads SWE-bench Verified at 87.6%, SWE-bench Pro at 64.3%, MCP-Atlas at 77.3%, and OSWorld-Verified at 78.0%. Vision resolution tripled to around 3.75 megapixels, which matters for computer-use agents and document extraction.
Opus 4.6 still makes sense in two specific cases. If your workflow leans heavily on web research, note that BrowseComp dropped from 83.7% to 79.3% between the two versions, so agents doing multi-page synthesis may regress. And if you have prompts tuned precisely to Opus 4.6's looser instruction interpretation, Opus 4.7's stricter literal reading can produce unexpected results until you re-tune them.
Claude Code changes that affect usage
Beyond the model itself, Claude Code got two features that interact with token consumption. The /ultrareview slash command runs a dedicated review pass that reads through changes and flags bugs a careful reviewer would catch. Pro and Max users get three free ultrareviews to try it.
Auto mode, previously limited to earlier tiers, now extends to Max users. It lets Claude make permission decisions on your behalf, which keeps long tasks running without manual approval for each tool call. Combined with xhigh default effort, auto mode can burn tokens quickly if left unattended, so set task budgets when running long autonomous sessions.
Where Opus 4.7 runs
Opus 4.7 is available on Claude for Pro, Max, Team, and Enterprise users, and through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Developers reference it with the model slug claude-opus-4-7. For workloads that require US-only inference, pricing runs at 1.1x the standard rate.
The context window stays at 1 million tokens, matching Opus 4.6. That ceiling hasn't changed, though independent tests suggest long-context retrieval quality shifted between versions, so verify recall on your own data before migrating long-running conversations.
Migrating from Opus 4.6
For most production workloads, the upgrade path is straightforward. Swap the model slug, drop effort by one tier as a starting point, and measure token use and task success on representative traffic before committing. Re-tune any prompts that relied on Opus 4.6's looser instruction interpretation, since Opus 4.7 follows instructions literally and may execute parts that older prompts expected the model to skip.
Anthropic reports that net token usage across effort levels improves on internal coding evaluations when the model completes work in fewer turns, so the per-task math can favor Opus 4.7 even with higher per-turn costs. Whether that holds for your workload depends on content type, tool usage, and how far you push effort levels.