Claude Opus 4.7 launched on April 16, 2026 with a rate card that looks identical to Opus 4.6, yet the number on your invoice may not match. Anthropic swapped in a new tokenizer that can break the same text into up to 35 percent more tokens, so the per-token price stayed flat while the per-request cost can quietly climb.
Opus 4.7 rate card
Pricing is published in US dollars on the Claude API pricing page. Opus 4.7 sits at the same tier as Opus 4.6 and Opus 4.5, well below the $15/$75 rates that applied to Opus 4 and 4.1.
| Category | Rate |
|---|---|
| Base input tokens | $5 / MTok |
| Output tokens | $25 / MTok |
| 5-minute cache write | $6.25 / MTok |
| 1-hour cache write | $10 / MTok |
| Cache hits and refreshes | $0.50 / MTok |
| Batch input | $2.50 / MTok |
| Batch output | $12.50 / MTok |
| US-only inference multiplier | 1.1x on all categories |
Opus 4.7 is available on the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, and ships to Pro, Max, Team, and Enterprise plans on Claude. The API model identifier is claude-opus-4-7.
The tokenizer change is the real pricing story
Opus 4.7 uses a new tokenizer that Anthropic links to improvements in accuracy and instruction following. The tradeoff is density. The same paragraph of English, the same Python function, the same JSON payload can split into more tokens than it did on Opus 4.6. Anthropic's own documentation caps the effect at up to 35 percent more tokens for the same fixed text, and early user tests have reported real-world overhead clustering around 30 to 50 percent on some coding and structured-data workloads.
Three practical consequences follow:
- A request that cost $0.10 on Opus 4.6 can cost anywhere from $0.10 to about $0.135 on Opus 4.7, depending on content mix.
- Output tokens matter more than input tokens for total cost, because output is priced five times higher. If 4.7 writes longer responses at the same "effort" setting, the gap widens.
- Cache boundaries shift with the new tokenizer, so existing cache entries miss on the first 4.7 run and have to be rewritten before savings kick in.
Opus 4.7 versus the rest of the Claude lineup
Opus is priced as the frontier tier. Sonnet 4.6 remains 40 percent cheaper per token on both input and output, and Haiku 4.5 runs at a fifth of Opus pricing.
| Model | Input ($/MTok) | Output ($/MTok) | Context |
|---|---|---|---|
| Claude Opus 4.7 | $5 | $25 | 1M tokens |
| Claude Opus 4.6 | $5 | $25 | 1M tokens |
| Claude Opus 4.5 | $5 | $25 | 1M tokens |
| Claude Opus 4.1 | $15 | $75 | 200K tokens |
| Claude Sonnet 4.6 | $3 | $15 | 1M tokens |
| Claude Sonnet 4.5 | $3 | $15 | 1M tokens |
| Claude Haiku 4.5 | $1 | $5 | 200K tokens |
| Claude Haiku 3.5 | $0.80 | $4 | 200K tokens |
The 1M token context window on Opus 4.7, Opus 4.6, and Sonnet 4.6 is billed at standard per-token rates across the full window. A 900,000-token request costs the same per token as a 9,000-token request.
Discount levers that actually move the bill
Two mechanisms do most of the work to offset the tokenizer change.
Prompt caching. A 5-minute cache write costs 1.25x the base input price, and a 1-hour write costs 2x. Cache reads come in at 0.1x the base rate, which works out to $0.50 per million tokens on Opus 4.7. That math means a 5-minute cache pays for itself after a single read, and a 1-hour cache pays off after two reads. System prompts, tool definitions, and stable document context are the obvious candidates.
Batch API. Asynchronous requests get a flat 50 percent discount on both input and output. Batch Opus 4.7 works out to $2.50 per million input tokens and $12.50 per million output tokens, and those discounts stack with prompt caching.
Data residency adds a 1.1x multiplier on every category (input, output, cache writes, cache reads) when you specify US-only inference through the inference_geo parameter. Global routing stays at the standard rates.
Worked cost example
A one-hour coding session on Claude Managed Agents using Opus 4.7, consuming 50,000 input tokens and 15,000 output tokens with no caching:
| Line item | Calculation | Cost |
|---|---|---|
| Input tokens | 50,000 × $5 / 1,000,000 | $0.25 |
| Output tokens | 15,000 × $25 / 1,000,000 | $0.375 |
| Session runtime | 1.0 hour × $0.08 | $0.08 |
| Total | $0.705 |
With prompt caching on 40,000 of those input tokens as cache reads, the same session drops to $0.525. Session runtime on Managed Agents is metered only while the session is actively running, not while it sits idle waiting for user input.
Tool use and server-side pricing
Tool calls bill on top of standard token costs. A few of the most common surcharges:
| Tool or service | Added cost |
|---|---|
| Tool use system prompt (Opus 4.7) | 313–346 input tokens |
| Bash tool | +245 input tokens per call |
| Text editor tool | +700 input tokens per call |
| Computer use tool | +735 input tokens per tool definition, plus 466–499 system prompt tokens |
| Web search | $10 per 1,000 searches, plus tokens for returned content |
| Web fetch | Standard token costs only |
| Code execution (standalone) | 1,550 free hours/month/org, then $0.05 per hour per container |
| Code execution (with web search or web fetch) | No additional container charge |
Web search results returned mid-conversation keep counting as input tokens in later turns, so a multi-turn research session compounds both the per-search fee and ongoing token costs.
Rate limits and migration
Opus rate limits are pooled across Opus 4.7, 4.6, 4.5, 4.1, and 4. Shifting traffic to 4.7 does not unlock extra quota, so a staged cutover is the sensible path. Anthropic has not published a sunset date for Opus 4.6 at launch, which means mixed traffic across versions is viable during migration.
Before moving a production workload, replay representative traffic through 4.7 and measure the token delta directly. The 35 percent figure is a ceiling, not a fixed rate, and it varies heavily by language, code density, and output length. It is also worth confirming that observability tools report 4.7 as a distinct model rather than rolling it into a generic "Opus" bucket, because the effective cost-per-request shift can otherwise hide inside aggregate dashboards.
Regional endpoints on AWS Bedrock and multi-region endpoints on Google Vertex AI carry a 10 percent premium over global endpoints for Sonnet 4.5, Haiku 4.5, and all newer models including Opus 4.7. Earlier models keep their prior pricing on those platforms.
When to stay, switch, or split
Opus 4.7 earns its premium on hard, long-horizon coding and agent work where quality differences translate directly into fewer retries and less human cleanup. For high-volume inference like classification, retrieval-augmented answers, moderation, or routine content generation, Sonnet 4.6 usually lands the same outcome at 40 percent less per token. Haiku 4.5 handles extraction and routing at a fifth of the Opus rate.
A common split: plan with Opus 4.7, cache the plan aggressively, then hand off execution to Sonnet 4.6 or Haiku 4.5. That pattern contains the tokenizer-driven cost increase to the short, high-value planning phase and keeps the bulk of token volume on cheaper models.