Claude Opus 4.7 Token Usage: What to Know Before You Upgrade

Claude Opus 4.7 landed on April 16, 2026 as Anthropic’s most capable generally available model, with sharper coding, better vision, and tighter instruction following. It also eats more tokens per task than Opus 4.6, and that shift is reshaping how people budget their subscriptions and API spend.

⚡

Quick answer: Opus 4.7 typically consumes 1.0x to 1.35x more tokens than Opus 4.6 for the same input, plus more output tokens at higher effort levels. Claude Code now defaults to the new xhigh effort tier, which compounds the increase.

Why token usage went up

Two changes drive the higher burn rate. First, Opus 4.7 ships with an updated tokenizer, which changes how raw text gets chopped into tokens before the model sees it. The same prompt can now map to roughly 1.0 to 1.35 times the token count it would have used on Opus 4.6, depending on content type. Code-heavy and structured inputs tend to land on the higher end.

Second, the model thinks more at higher effort levels, especially on later turns in agentic sessions. That produces more output tokens on hard problems, which is part of why reliability improved on long-running tasks but also why sessions end faster on fixed-quota plans.

Pricing and API costs

Per-token pricing matches Opus 4.6 exactly, so nothing changed at the rate card level. The cost shift comes entirely from token volume per task.

Metric	Opus 4.7
Input tokens	$5 per million
Output tokens	$25 per million
Tokenizer change vs 4.6	1.0x to 1.35x input tokens
Prompt caching savings	Up to 90%
Batch processing savings	50%
US-only inference	1.1x base pricing
Model slug	`claude-opus-4-7`

The practical effect: a task that cost $1 on Opus 4.6 can now cost $1 to $1.35 on the input side before accounting for extra thinking tokens on the output side. Anthropic recommends measuring the real difference on your own traffic rather than assuming a flat multiplier.

Effort levels and the new xhigh tier

Effort levels control how much the model reasons before answering. Opus 4.7 adds a new xhigh tier that sits between high and max. In Claude Code, the default jumped to xhigh for every plan, which means users on Pro and Max subscriptions will see quota drain faster than they did on Opus 4.6 unless they change settings.

Effort level	Relative thinking	Best for
low	Minimal	Short edits, simple questions
medium	Moderate	Standard coding and analysis
high	Extended	Complex refactors, multi-step plans
xhigh	Longer sustained reasoning	Hard agentic coding (new default in Claude Code)
max	Highest	The hardest problems, slowest responses

One partner noted that low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6, so you can often step down a tier and still come out ahead on quality. Anthropic recommends starting at high or xhigh for coding and agentic work, then adjusting based on results.

Ways to control token spend

There are four practical levers for keeping usage in check on Opus 4.7.

Drop the effort parameter. If a task completes well at medium or high, don’t leave it at the new xhigh default. This is the single biggest lever for reducing output tokens.

Use task budgets, now available in public beta on the API. Task budgets let you cap how many tokens the model can spend across a long run, so it prioritizes the important work instead of burning the whole budget early.

Turn on prompt caching for repeated system prompts and context. Caching cuts input token costs by up to 90% on cached segments, which is significant for agents that reuse the same instructions across many turns.

Use batch processing for non-urgent jobs. Batch cuts pricing in half for both input and output tokens, and it works well for bulk code reviews, document analysis, and scheduled runs.

💡

If you are on a fixed-quota Claude subscription (Pro, Max, or Team), the effort parameter is the only knob that meaningfully changes how fast you hit your session limit. Consider setting Claude Code back to high for routine work.

When to use Opus 4.7 vs Opus 4.6

Opus 4.7 is a clear upgrade for coding agents, tool-heavy workflows, and anything involving dense screenshots or diagrams. It leads SWE-bench Verified at 87.6%, SWE-bench Pro at 64.3%, MCP-Atlas at 77.3%, and OSWorld-Verified at 78.0%. Vision resolution tripled to around 3.75 megapixels, which matters for computer-use agents and document extraction.

Opus 4.6 still makes sense in two specific cases. If your workflow leans heavily on web research, note that BrowseComp dropped from 83.7% to 79.3% between the two versions, so agents doing multi-page synthesis may regress. And if you have prompts tuned precisely to Opus 4.6’s looser instruction interpretation, Opus 4.7’s stricter literal reading can produce unexpected results until you re-tune them.

Claude Code changes that affect usage

Beyond the model itself, Claude Code got two features that interact with token consumption. The /ultrareview slash command runs a dedicated review pass that reads through changes and flags bugs a careful reviewer would catch. Pro and Max users get three free ultrareviews to try it.

Auto mode, previously limited to earlier tiers, now extends to Max users. It lets Claude make permission decisions on your behalf, which keeps long tasks running without manual approval for each tool call. Combined with xhigh default effort, auto mode can burn tokens quickly if left unattended, so set task budgets when running long autonomous sessions.

Where Opus 4.7 runs

Opus 4.7 is available on Claude for Pro, Max, Team, and Enterprise users, and through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Developers reference it with the model slug claude-opus-4-7. For workloads that require US-only inference, pricing runs at 1.1x the standard rate.

The context window stays at 1 million tokens, matching Opus 4.6. That ceiling hasn’t changed, though independent tests suggest long-context retrieval quality shifted between versions, so verify recall on your own data before migrating long-running conversations.

Migrating from Opus 4.6

For most production workloads, the upgrade path is straightforward. Swap the model slug, drop effort by one tier as a starting point, and measure token use and task success on representative traffic before committing. Re-tune any prompts that relied on Opus 4.6’s looser instruction interpretation, since Opus 4.7 follows instructions literally and may execute parts that older prompts expected the model to skip.

Anthropic reports that net token usage across effort levels improves on internal coding evaluations when the model completes work in fewer turns, so the per-task math can favor Opus 4.7 even with higher per-turn costs. Whether that holds for your workload depends on content type, tool usage, and how far you push effort levels.