Skip to content

Claude Opus 4.7 Pricing: Same Rate Card, Bigger Bill

Shivam Malani
Claude Opus 4.7 Pricing: Same Rate Card, Bigger Bill

Claude Opus 4.7 launched on April 16, 2026 with a rate card that looks identical to Opus 4.6, yet the number on your invoice may not match. Anthropic swapped in a new tokenizer that can break the same text into up to 35 percent more tokens, so the per-token price stayed flat while the per-request cost can quietly climb.

💡
Quick answer: Opus 4.7 costs $5 per million input tokens and $25 per million output tokens, the same as Opus 4.6. The new tokenizer can produce up to 35% more tokens for the same text, raising effective per-request cost by 0–35%.

Opus 4.7 rate card

Pricing is published in US dollars on the Claude API pricing page. Opus 4.7 sits at the same tier as Opus 4.6 and Opus 4.5, well below the $15/$75 rates that applied to Opus 4 and 4.1.

CategoryRate
Base input tokens$5 / MTok
Output tokens$25 / MTok
5-minute cache write$6.25 / MTok
1-hour cache write$10 / MTok
Cache hits and refreshes$0.50 / MTok
Batch input$2.50 / MTok
Batch output$12.50 / MTok
US-only inference multiplier1.1x on all categories

Opus 4.7 is available on the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, and ships to Pro, Max, Team, and Enterprise plans on Claude. The API model identifier is claude-opus-4-7.


The tokenizer change is the real pricing story

Opus 4.7 uses a new tokenizer that Anthropic links to improvements in accuracy and instruction following. The tradeoff is density. The same paragraph of English, the same Python function, the same JSON payload can split into more tokens than it did on Opus 4.6. Anthropic's own documentation caps the effect at up to 35 percent more tokens for the same fixed text, and early user tests have reported real-world overhead clustering around 30 to 50 percent on some coding and structured-data workloads.

Three practical consequences follow:

  • A request that cost $0.10 on Opus 4.6 can cost anywhere from $0.10 to about $0.135 on Opus 4.7, depending on content mix.
  • Output tokens matter more than input tokens for total cost, because output is priced five times higher. If 4.7 writes longer responses at the same "effort" setting, the gap widens.
  • Cache boundaries shift with the new tokenizer, so existing cache entries miss on the first 4.7 run and have to be rewritten before savings kick in.

Opus 4.7 versus the rest of the Claude lineup

Opus is priced as the frontier tier. Sonnet 4.6 remains 40 percent cheaper per token on both input and output, and Haiku 4.5 runs at a fifth of Opus pricing.

ModelInput ($/MTok)Output ($/MTok)Context
Claude Opus 4.7$5$251M tokens
Claude Opus 4.6$5$251M tokens
Claude Opus 4.5$5$251M tokens
Claude Opus 4.1$15$75200K tokens
Claude Sonnet 4.6$3$151M tokens
Claude Sonnet 4.5$3$151M tokens
Claude Haiku 4.5$1$5200K tokens
Claude Haiku 3.5$0.80$4200K tokens

The 1M token context window on Opus 4.7, Opus 4.6, and Sonnet 4.6 is billed at standard per-token rates across the full window. A 900,000-token request costs the same per token as a 9,000-token request.


Discount levers that actually move the bill

Two mechanisms do most of the work to offset the tokenizer change.

Prompt caching. A 5-minute cache write costs 1.25x the base input price, and a 1-hour write costs 2x. Cache reads come in at 0.1x the base rate, which works out to $0.50 per million tokens on Opus 4.7. That math means a 5-minute cache pays for itself after a single read, and a 1-hour cache pays off after two reads. System prompts, tool definitions, and stable document context are the obvious candidates.

Batch API. Asynchronous requests get a flat 50 percent discount on both input and output. Batch Opus 4.7 works out to $2.50 per million input tokens and $12.50 per million output tokens, and those discounts stack with prompt caching.

Data residency adds a 1.1x multiplier on every category (input, output, cache writes, cache reads) when you specify US-only inference through the inference_geo parameter. Global routing stays at the standard rates.


Worked cost example

A one-hour coding session on Claude Managed Agents using Opus 4.7, consuming 50,000 input tokens and 15,000 output tokens with no caching:

Line itemCalculationCost
Input tokens50,000 × $5 / 1,000,000$0.25
Output tokens15,000 × $25 / 1,000,000$0.375
Session runtime1.0 hour × $0.08$0.08
Total$0.705

With prompt caching on 40,000 of those input tokens as cache reads, the same session drops to $0.525. Session runtime on Managed Agents is metered only while the session is actively running, not while it sits idle waiting for user input.


Tool use and server-side pricing

Tool calls bill on top of standard token costs. A few of the most common surcharges:

Tool or serviceAdded cost
Tool use system prompt (Opus 4.7)313–346 input tokens
Bash tool+245 input tokens per call
Text editor tool+700 input tokens per call
Computer use tool+735 input tokens per tool definition, plus 466–499 system prompt tokens
Web search$10 per 1,000 searches, plus tokens for returned content
Web fetchStandard token costs only
Code execution (standalone)1,550 free hours/month/org, then $0.05 per hour per container
Code execution (with web search or web fetch)No additional container charge

Web search results returned mid-conversation keep counting as input tokens in later turns, so a multi-turn research session compounds both the per-search fee and ongoing token costs.


Rate limits and migration

Opus rate limits are pooled across Opus 4.7, 4.6, 4.5, 4.1, and 4. Shifting traffic to 4.7 does not unlock extra quota, so a staged cutover is the sensible path. Anthropic has not published a sunset date for Opus 4.6 at launch, which means mixed traffic across versions is viable during migration.

Before moving a production workload, replay representative traffic through 4.7 and measure the token delta directly. The 35 percent figure is a ceiling, not a fixed rate, and it varies heavily by language, code density, and output length. It is also worth confirming that observability tools report 4.7 as a distinct model rather than rolling it into a generic "Opus" bucket, because the effective cost-per-request shift can otherwise hide inside aggregate dashboards.

Regional endpoints on AWS Bedrock and multi-region endpoints on Google Vertex AI carry a 10 percent premium over global endpoints for Sonnet 4.5, Haiku 4.5, and all newer models including Opus 4.7. Earlier models keep their prior pricing on those platforms.


When to stay, switch, or split

Opus 4.7 earns its premium on hard, long-horizon coding and agent work where quality differences translate directly into fewer retries and less human cleanup. For high-volume inference like classification, retrieval-augmented answers, moderation, or routine content generation, Sonnet 4.6 usually lands the same outcome at 40 percent less per token. Haiku 4.5 handles extraction and routing at a fifth of the Opus rate.

A common split: plan with Opus 4.7, cache the plan aggressively, then hand off execution to Sonnet 4.6 or Haiku 4.5. That pattern contains the tokenizer-driven cost increase to the short, high-value planning phase and keeps the bulk of token volume on cheaper models.