Claude Adaptive Thinking Explained: How It Works and When to Use It

Adaptive thinking is Anthropic’s newer approach to extended reasoning, letting Claude decide on its own whether a prompt needs deep thought and how much of it to apply. Instead of handing the model a fixed token budget for internal reasoning, you flip a single switch and let Claude scale effort to the complexity of the request. On the latest Opus-class model, it is the only thinking mode the API will accept.

⚡

Quick answer: Set thinking: {type: “adaptive”} in your Messages API request. Claude will decide when and how deeply to think. Use the effort parameter (low, medium, high, max, or xhigh on Opus 4.7) to nudge that behavior.

What adaptive thinking actually does

In adaptive mode, reasoning becomes optional from the model’s point of view. Claude evaluates each incoming request, judges its complexity, and chooses whether to produce a thinking block at all, plus how long to spend inside it. At the default high effort setting, the model almost always thinks. At lower effort levels, it will skip thinking on trivial queries like a quick factual lookup.

This replaces the older pattern of setting a fixed budget_tokens value and hoping it fit the task. Anthropic reports that adaptive thinking drives better performance than a fixed budget on many workloads, particularly bimodal tasks where some requests need minimal reasoning and others need a lot, and on long-horizon agentic workflows.

Adaptive mode also automatically turns on interleaved thinking, which means Claude can think between tool calls rather than only at the start of a turn. For agent loops that chain search, code execution, and follow-up reasoning, this matters. No beta header is required.

Supported models and availability

Adaptive thinking is not available everywhere. It is specifically tied to the newer Claude 4-series reasoning models, and behavior differs meaningfully across them.

Model	Adaptive thinking behavior
Claude Mythos Preview (`claude-mythos-preview`)	Default mode; applies automatically when `thinking` is unset. `type: "disabled"` is not supported.
Claude Opus 4.7 (`claude-opus-4-7`)	Only supported thinking mode. Manual `type: "enabled"` with `budget_tokens` returns a 400 error. Thinking is off unless you explicitly set `type: "adaptive"`.
Claude Opus 4.6 (`claude-opus-4-6`)	Recommended mode. Manual `budget_tokens` still works but is deprecated.
Claude Sonnet 4.6 (`claude-sonnet-4-6`)	Recommended mode. Manual `budget_tokens` still works but is deprecated.
Older models (Sonnet 4.5, Opus 4.5, etc.)	Not supported. Require manual `type: "enabled"` with `budget_tokens`.

If you are still shipping on Opus 4.6 or Sonnet 4.6 with a fixed budget, your requests will keep working, but plan a migration. Anthropic has marked that configuration for removal in a future model release.

The effort parameter

Effort is how you steer adaptive thinking without micromanaging token counts. It acts as soft guidance to the model about how much reasoning to allocate, and it pairs directly with max_tokens, which remains a hard cap on total output (thinking plus visible response).

Effort	Behavior	Availability
`max`	Always thinks, no constraints on depth.	Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6
`xhigh`	Always thinks deeply with extended exploration.	Opus 4.7 only
`high` (default)	Always thinks. Deep reasoning on complex tasks.	All adaptive-capable models
`medium`	Moderate thinking. May skip for very simple queries.	All adaptive-capable models
`low`	Minimal thinking. Skips for simple, speed-sensitive tasks.	All adaptive-capable models

If you see responses hitting stop_reason: "max_tokens" at high or max effort, either raise max_tokens or dial the effort down. The two controls work together: max_tokens for hard cost ceilings, effort for shaping the model’s internal allocation.

A minimal request

The API call itself is small. Here is the basic shape using the Python SDK:


import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "medium"},
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
)

print(response.content[0].text)

Adaptive thinking with effort

Streaming works the same way. Thinking content arrives via thinking_delta events inside content_block_delta, exactly as with manual mode. You iterate the stream and handle thinking and text deltas separately.

Summarized versus omitted thinking

Thinking output is controlled by a display field on the thinking configuration. It has two accepted values, and the defaults differ by model.

display value	What you get back	Default on
`"summarized"`	Thinking block contains a summary of Claude’s internal reasoning.	Opus 4.6, Sonnet 4.6, earlier Claude 4 models
`"omitted"`	Empty `thinking` field. The encrypted `signature` still carries the full reasoning for multi-turn continuity.	Opus 4.7, Mythos Preview

The switch to "omitted" on Opus 4.7 is a silent change from Opus 4.6. If you were relying on visible summarized thinking and suddenly see empty blocks, set it back explicitly:


thinking = {
    "type": "adaptive",
    "display": "summarized",
}

Restore summarized thinking on Opus 4.7

Omitting the summary is a latency optimization, not a cost one. You are still billed for the full internal thinking tokens either way. What changes is time-to-first-text-token when streaming, because the server skips streaming thinking tokens and delivers only the signature before the final text response.

Billing realities

Extended thinking is charged as output tokens for the full internal reasoning, not for what you see in the response. That creates a consistent gotcha: the billed output token count will not match the token count visible in the response body.

Input tokens: your original request, not thinking tokens from previous turns.
Billed output tokens: full internal thinking tokens generated by the model.
Visible output tokens: either the summary (with "summarized") or zero (with "omitted").
No charge for generating the summary itself.

When you pass thinking blocks back into a multi-turn conversation, the thinking tokens from the last assistant turn count as input tokens. A specialized system prompt is also automatically included when extended thinking is enabled.

Prompt caching behavior

Consecutive requests using the same adaptive thinking configuration preserve prompt cache breakpoints as expected. The wrinkle is mode switching: flipping between adaptive, enabled, and disabled breaks cache breakpoints for messages. System prompts and tool definitions remain cached regardless of mode changes.

If cache hit rates suddenly collapse, check whether something in your pipeline is toggling thinking modes between turns of the same conversation.

Validation and multi-turn rules

Adaptive mode is more forgiving than manual mode on prior turns. Previous assistant turns do not need to start with thinking blocks, which relaxes the stricter requirement manual-thinking requests enforce. This simplifies round-tripping conversation history from mixed sources.

When you do pass thinking blocks back, send them unchanged. The server decrypts the signature field to reconstruct the original reasoning for prompt construction. If you pass an omitted block back with custom text stuffed into the thinking field, that text is ignored. The signature is identical whether display is summarized or omitted, and switching display values between turns is supported.

Thinking blocks only strictly need to be echoed back when you are combining extended thinking with tool use. Otherwise you can drop them from prior turns, or let the API strip them for you.

Interleaved thinking across modes

Interleaved thinking, where Claude reasons between tool calls, is tied to both mode and model.

Mode / Model	Interleaved thinking
Adaptive mode on Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6	Automatically enabled. On Mythos Preview and Opus 4.7, inter-tool reasoning always lives inside thinking blocks.
Manual mode on Sonnet 4.6	Requires the `interleaved-thinking-2025-05-14` beta header.
Manual mode on Opus 4.6	Not available. Use adaptive mode if you need thinking between tool calls.

For any agent that chains multiple tool calls, adaptive mode is the simpler path on current Opus and Sonnet models.

Steering when Claude thinks too much or too little

Adaptive thinking’s triggering behavior is promptable. If the model is over-thinking simple prompts or under-thinking nuanced ones, add explicit guidance to your system prompt. A common pattern looks like this:


Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.

System prompt nudge

Pushing Claude to think less can reduce quality on tasks that genuinely benefit from reasoning. Before rolling a prompt-level nudge into production, measure against your own evals, and try a lower effort level first.

When to choose adaptive, manual, or disabled

Mode	Config	Best for
Adaptive	`thinking: {type: "adaptive"}`	Default choice on Opus 4.7, Opus 4.6, Sonnet 4.6, and Mythos Preview. Let Claude decide; steer with `effort`.
Manual	`thinking: {type: "enabled", budget_tokens: N}`	When you need predictable latency or precise control over thinking token spend. Not accepted on Opus 4.7. Deprecated on Opus 4.6 and Sonnet 4.6.
Disabled	Omit `thinking` or pass `{type: "disabled"}`	Lowest latency when you do not need extended reasoning. Not supported on Mythos Preview.

For new projects on Opus 4.6, Sonnet 4.6, or anything newer, start with adaptive at default effort and tune from there. Reach for manual only if your workload has strict latency SLAs or needs exact per-request cost bounds that adaptive’s soft guidance cannot guarantee.

Practical notes on production use

A few things tend to trip up teams moving from fixed budgets to adaptive:

At high and max effort, the model can run long and exhaust max_tokens before finishing a visible response. Leave headroom.
Billing dashboards will show higher output token counts than response bodies suggest. This is expected under summarized or omitted display.
If you simply want faster first-token streaming and do not surface reasoning to users, pass display: "omitted". Cost does not change.
The signature field is opaque and cross-platform. Values generated on the Claude API work with Amazon Bedrock and Vertex AI.
On Bedrock, max effort is restricted to Opus 4.6. Requests using max on other Bedrock-hosted models return an error.

If you need a full specification or want to compare against the older manual-budget path, the Claude API documentation covers every field, including streaming event shapes and tool-use interactions.

Adaptive thinking is the direction Anthropic is clearly pushing for its frontier models, and on Opus 4.7 the choice has already been made for you. For most workloads, handing that decision to the model, then shaping it with effort and display, produces cleaner code and better results than hand-tuning a token budget per request.