Adaptive thinking is Anthropic's newer approach to extended reasoning, letting Claude decide on its own whether a prompt needs deep thought and how much of it to apply. Instead of handing the model a fixed token budget for internal reasoning, you flip a single switch and let Claude scale effort to the complexity of the request. On the latest Opus-class model, it is the only thinking mode the API will accept.
What adaptive thinking actually does
In adaptive mode, reasoning becomes optional from the model's point of view. Claude evaluates each incoming request, judges its complexity, and chooses whether to produce a thinking block at all, plus how long to spend inside it. At the default high effort setting, the model almost always thinks. At lower effort levels, it will skip thinking on trivial queries like a quick factual lookup.
This replaces the older pattern of setting a fixed budget_tokens value and hoping it fit the task. Anthropic reports that adaptive thinking drives better performance than a fixed budget on many workloads, particularly bimodal tasks where some requests need minimal reasoning and others need a lot, and on long-horizon agentic workflows.
Adaptive mode also automatically turns on interleaved thinking, which means Claude can think between tool calls rather than only at the start of a turn. For agent loops that chain search, code execution, and follow-up reasoning, this matters. No beta header is required.
Supported models and availability
Adaptive thinking is not available everywhere. It is specifically tied to the newer Claude 4-series reasoning models, and behavior differs meaningfully across them.
| Model | Adaptive thinking behavior |
|---|---|
Claude Mythos Preview (claude-mythos-preview) | Default mode; applies automatically when thinking is unset. type: "disabled" is not supported. |
Claude Opus 4.7 (claude-opus-4-7) | Only supported thinking mode. Manual type: "enabled" with budget_tokens returns a 400 error. Thinking is off unless you explicitly set type: "adaptive". |
Claude Opus 4.6 (claude-opus-4-6) | Recommended mode. Manual budget_tokens still works but is deprecated. |
Claude Sonnet 4.6 (claude-sonnet-4-6) | Recommended mode. Manual budget_tokens still works but is deprecated. |
| Older models (Sonnet 4.5, Opus 4.5, etc.) | Not supported. Require manual type: "enabled" with budget_tokens. |
If you are still shipping on Opus 4.6 or Sonnet 4.6 with a fixed budget, your requests will keep working, but plan a migration. Anthropic has marked that configuration for removal in a future model release.
The effort parameter
Effort is how you steer adaptive thinking without micromanaging token counts. It acts as soft guidance to the model about how much reasoning to allocate, and it pairs directly with max_tokens, which remains a hard cap on total output (thinking plus visible response).
| Effort | Behavior | Availability |
|---|---|---|
max | Always thinks, no constraints on depth. | Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6 |
xhigh | Always thinks deeply with extended exploration. | Opus 4.7 only |
high (default) | Always thinks. Deep reasoning on complex tasks. | All adaptive-capable models |
medium | Moderate thinking. May skip for very simple queries. | All adaptive-capable models |
low | Minimal thinking. Skips for simple, speed-sensitive tasks. | All adaptive-capable models |
If you see responses hitting stop_reason: "max_tokens" at high or max effort, either raise max_tokens or dial the effort down. The two controls work together: max_tokens for hard cost ceilings, effort for shaping the model's internal allocation.
A minimal request
The API call itself is small. Here is the basic shape using the Python SDK:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
output_config={"effort": "medium"},
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
)
print(response.content[0].text)
Adaptive thinking with effort
Streaming works the same way. Thinking content arrives via thinking_delta events inside content_block_delta, exactly as with manual mode. You iterate the stream and handle thinking and text deltas separately.
Summarized versus omitted thinking
Thinking output is controlled by a display field on the thinking configuration. It has two accepted values, and the defaults differ by model.
| display value | What you get back | Default on |
|---|---|---|
"summarized" | Thinking block contains a summary of Claude's internal reasoning. | Opus 4.6, Sonnet 4.6, earlier Claude 4 models |
"omitted" | Empty thinking field. The encrypted signature still carries the full reasoning for multi-turn continuity. | Opus 4.7, Mythos Preview |
The switch to "omitted" on Opus 4.7 is a silent change from Opus 4.6. If you were relying on visible summarized thinking and suddenly see empty blocks, set it back explicitly:
thinking = {
"type": "adaptive",
"display": "summarized",
}
Restore summarized thinking on Opus 4.7
Omitting the summary is a latency optimization, not a cost one. You are still billed for the full internal thinking tokens either way. What changes is time-to-first-text-token when streaming, because the server skips streaming thinking tokens and delivers only the signature before the final text response.
Billing realities
Extended thinking is charged as output tokens for the full internal reasoning, not for what you see in the response. That creates a consistent gotcha: the billed output token count will not match the token count visible in the response body.
- Input tokens: your original request, not thinking tokens from previous turns.
- Billed output tokens: full internal thinking tokens generated by the model.
- Visible output tokens: either the summary (with
"summarized") or zero (with"omitted"). - No charge for generating the summary itself.
When you pass thinking blocks back into a multi-turn conversation, the thinking tokens from the last assistant turn count as input tokens. A specialized system prompt is also automatically included when extended thinking is enabled.
Prompt caching behavior
Consecutive requests using the same adaptive thinking configuration preserve prompt cache breakpoints as expected. The wrinkle is mode switching: flipping between adaptive, enabled, and disabled breaks cache breakpoints for messages. System prompts and tool definitions remain cached regardless of mode changes.
If cache hit rates suddenly collapse, check whether something in your pipeline is toggling thinking modes between turns of the same conversation.
Validation and multi-turn rules
Adaptive mode is more forgiving than manual mode on prior turns. Previous assistant turns do not need to start with thinking blocks, which relaxes the stricter requirement manual-thinking requests enforce. This simplifies round-tripping conversation history from mixed sources.
When you do pass thinking blocks back, send them unchanged. The server decrypts the signature field to reconstruct the original reasoning for prompt construction. If you pass an omitted block back with custom text stuffed into the thinking field, that text is ignored. The signature is identical whether display is summarized or omitted, and switching display values between turns is supported.
Thinking blocks only strictly need to be echoed back when you are combining extended thinking with tool use. Otherwise you can drop them from prior turns, or let the API strip them for you.
Interleaved thinking across modes
Interleaved thinking, where Claude reasons between tool calls, is tied to both mode and model.
| Mode / Model | Interleaved thinking |
|---|---|
| Adaptive mode on Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6 | Automatically enabled. On Mythos Preview and Opus 4.7, inter-tool reasoning always lives inside thinking blocks. |
| Manual mode on Sonnet 4.6 | Requires the interleaved-thinking-2025-05-14 beta header. |
| Manual mode on Opus 4.6 | Not available. Use adaptive mode if you need thinking between tool calls. |
For any agent that chains multiple tool calls, adaptive mode is the simpler path on current Opus and Sonnet models.
Steering when Claude thinks too much or too little
Adaptive thinking's triggering behavior is promptable. If the model is over-thinking simple prompts or under-thinking nuanced ones, add explicit guidance to your system prompt. A common pattern looks like this:
Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.
System prompt nudge
Pushing Claude to think less can reduce quality on tasks that genuinely benefit from reasoning. Before rolling a prompt-level nudge into production, measure against your own evals, and try a lower effort level first.
When to choose adaptive, manual, or disabled
| Mode | Config | Best for |
|---|---|---|
| Adaptive | thinking: {type: "adaptive"} | Default choice on Opus 4.7, Opus 4.6, Sonnet 4.6, and Mythos Preview. Let Claude decide; steer with effort. |
| Manual | thinking: {type: "enabled", budget_tokens: N} | When you need predictable latency or precise control over thinking token spend. Not accepted on Opus 4.7. Deprecated on Opus 4.6 and Sonnet 4.6. |
| Disabled | Omit thinking or pass {type: "disabled"} | Lowest latency when you do not need extended reasoning. Not supported on Mythos Preview. |
For new projects on Opus 4.6, Sonnet 4.6, or anything newer, start with adaptive at default effort and tune from there. Reach for manual only if your workload has strict latency SLAs or needs exact per-request cost bounds that adaptive's soft guidance cannot guarantee.
Practical notes on production use
A few things tend to trip up teams moving from fixed budgets to adaptive:
- At
highandmaxeffort, the model can run long and exhaustmax_tokensbefore finishing a visible response. Leave headroom. - Billing dashboards will show higher output token counts than response bodies suggest. This is expected under summarized or omitted display.
- If you simply want faster first-token streaming and do not surface reasoning to users, pass
display: "omitted". Cost does not change. - The
signaturefield is opaque and cross-platform. Values generated on the Claude API work with Amazon Bedrock and Vertex AI. - On Bedrock,
maxeffort is restricted to Opus 4.6. Requests usingmaxon other Bedrock-hosted models return an error.
If you need a full specification or want to compare against the older manual-budget path, the Claude API documentation covers every field, including streaming event shapes and tool-use interactions.
Adaptive thinking is the direction Anthropic is clearly pushing for its frontier models, and on Opus 4.7 the choice has already been made for you. For most workloads, handing that decision to the model, then shaping it with effort and display, produces cleaner code and better results than hand-tuning a token budget per request.