Skip to content

Claude Opus 4.7 adaptive thinking, explained

Shivam Malani
Claude Opus 4.7 adaptive thinking, explained

Claude Opus 4.7 ships with a single thinking mode called adaptive thinking, and it changes how reasoning tokens get spent on every request. The model decides how much to think based on the difficulty of the prompt, instead of running against a fixed budget you set up front. It is also off by default on the Messages API, which trips up a lot of people migrating from Opus 4.6.

💡
Quick answer: Adaptive thinking is the only thinking-on mode in Opus 4.7. It is disabled by default. To enable it, set thinking: {"type": "adaptive"} on your request. The old budget_tokens parameter now returns a 400 error.

What adaptive thinking does

Adaptive thinking lets the model allocate reasoning tokens dynamically per request. Hard problems get more internal reasoning, simple prompts get less, and the decision happens at inference time rather than being capped by a number you pass in. Anthropic's internal evaluations showed this approach consistently outperforms the fixed-budget extended thinking used in Opus 4.6.

The tradeoff is control. You no longer tell the model to think for exactly 32,000 tokens. You tell it to think adaptively, and the model picks. If you need tighter control over spend across an entire agentic loop, that now lives in a separate feature called task budgets.


Turning adaptive thinking on

On Opus 4.7, any request without a thinking field runs with thinking off. That is a behavior change from earlier Opus versions, and it catches teams who assumed thinking would carry over. You have to opt in explicitly.


response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=32000,
    thinking={"type": "adaptive"},
    messages=[
        {"role": "user", "content": "Refactor this module for testability."}
    ],
)

Enabling adaptive thinking on Opus 4.7

If you also want the reasoning visible in responses, add a display flag. Thinking content is omitted from responses by default, which slightly reduces latency but hides the model's intermediate steps.


thinking = {
    "type": "adaptive",
    "display": "summarized",  # "omitted" is the default
}

Opting back into visible reasoning

If your product streams reasoning to users, the omitted default will look like a long pause before the output starts. Setting display to summarized restores the streaming progress behavior users expect.


What was removed from the Messages API

Opus 4.7 drops a handful of parameters that worked on Opus 4.6. These are hard breaking changes on the Messages API, not soft deprecations. Claude Managed Agents users are not affected.

ParameterOpus 4.6 behaviorOpus 4.7 behavior
Extended thinking budgetthinking: {"type": "enabled", "budget_tokens": N}Returns 400 error; use adaptive instead
Sampling: temperature, top_p, top_kAccepted at any valueNon-default values return 400 error
Thinking content in responseIncluded by defaultOmitted by default; opt in with display: "summarized"
TokenizerPrevious tokenizerNew tokenizer, up to 1.35x token count on the same text

If you were using temperature = 0 for determinism, it never guaranteed identical outputs anyway. The cleanest migration path is to strip sampling parameters entirely and steer behavior through prompting.


Effort levels and the new xhigh tier

Adaptive thinking works alongside the effort parameter, which controls the intelligence-versus-cost tradeoff more coarsely. Opus 4.7 adds an xhigh level that sits between high and max, giving you a middle option for hard coding and agentic work where you want more reasoning than high but not the full latency of max.

Anthropic recommends starting at high or xhigh for coding and agentic use cases, and keeping at least high for any intelligence-sensitive work. In Claude Code, the default effort level is now xhigh for all plans. Effort is a Messages API feature; Claude Managed Agents handles it automatically.


Task budgets versus max_tokens

Because adaptive thinking removed the fine-grained thinking budget, Anthropic shipped task budgets as the new way to cap spend on long-running agents. Task budgets are in public beta and require the header task-budgets-2026-03-13.

A task budget is advisory. It tells the model to target a token total across an entire agentic loop, including thinking, tool calls, tool results, and the final output. The model sees a running countdown and uses it to pace itself, cut low-value steps, and finish cleanly. Minimum is 20,000 tokens. It is not a hard cap — the model can overshoot.


response = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=128000,
    output_config={
        "effort": "high",
        "task_budget": {"type": "tokens", "total": 128000},
    },
    messages=[
        {"role": "user", "content": "Review the codebase and propose a refactor plan."}
    ],
    betas=["task-budgets-2026-03-13"],
)

Task budget with adaptive thinking

The difference from max_tokens matters. max_tokens is a hard ceiling per request, invisible to the model. task_budget is a suggestion across the whole loop that the model is aware of. Use task_budget for self-moderation, and max_tokens as the per-request fence. For open-ended agentic work where you want the best answer, skip the task budget.


Behavior shifts that come with adaptive thinking

Adaptive thinking is not just an on/off switch. It pairs with a set of behavior changes in Opus 4.7 that affect how prompts land. If you copy old prompts over without adjustments, you will see differences.

  • Literal instruction following. The model no longer silently generalizes instructions from one item to another, especially at lower effort levels.
  • Response length scales with task complexity rather than defaulting to a fixed verbosity.
  • Fewer tool calls by default. The model prefers reasoning over action. Raising effort increases tool usage.
  • More direct tone with less validation-forward phrasing and fewer emoji than Opus 4.6.
  • More regular progress updates during long agentic traces. Remove scaffolding that forced status messages.
  • Fewer subagents spawned by default. Steerable through prompting if you need more.

Prompts that previously contained mitigations like "double-check the slide layout before returning" or "think step by step before answering" can often be simplified. Opus 4.7 handles those patterns natively, and leaving the scaffolding in can cause overcorrection.


Tokenizer change and cost implications

Opus 4.7 uses a new tokenizer. The same text may map to roughly 1.0 to 1.35 times as many tokens compared to Opus 4.6, depending on content type. Per-token pricing is unchanged at $5 per million input tokens and $25 per million output tokens, but effective cost per request can rise.

Two practical effects matter. First, /v1/messages/count_tokens returns different numbers than it did on Opus 4.6, so any compaction triggers tuned to specific thresholds need adjustment. Second, you should raise max_tokens to give the model headroom, especially for responses that were close to the limit before.

The 1M token context window remains available at standard pricing with no long-context premium.


Verifying your migration worked

Step 1: Switch your model ID from claude-opus-4-6 to claude-opus-4-7 and send a test request without a thinking field. Confirm the response succeeds and note that no reasoning is produced.

Step 2: Add thinking: {"type": "adaptive"} to the request. The response should now include thinking blocks in the stream, even though their content is empty unless you opt into display.

Step 3: Remove any temperature, top_p, or top_k parameters from your client code. Send a request with one set to a non-default value and confirm you get a 400 error, which verifies you've caught all code paths.

Step 4: Run /v1/messages/count_tokens against representative prompts from your workload and compare against the Opus 4.6 counts. Adjust max_tokens and any compaction thresholds accordingly.

You'll know adaptive thinking is working correctly when simple prompts return quickly with minimal reasoning overhead, and complex prompts show visibly longer thinking phases — without you changing any parameters between the two requests.


Adaptive thinking is a meaningful shift in how you reason about cost and latency on Opus 4.7. The automatic allocation removes a tuning lever that some developers relied on, but replaces it with behavior that, in practice, tracks task difficulty more closely than fixed budgets ever did. If you need hard cost control, reach for task budgets and max_tokens together. If you need peak quality, leave the task budget off and let the model decide.