Two very different bets sit at the top of the AI model market right now. Anthropic's Claude Opus 4.7, released April 16, 2026, is a closed, premium-priced flagship aimed at the hardest reasoning and coding work. DeepSeek V4, released April 24, 2026, is a 1.6-trillion-parameter Mixture-of-Experts model with open weights under the MIT License and pricing roughly one-sixth of Opus on a blended basis.
Quick answer: Claude Opus 4.7 leads on most shared benchmarks (GPQA Diamond, SWE-Bench Pro, Humanity's Last Exam, MCP Atlas). DeepSeek V4-Pro trails by single-digit points on several of those tests, ties or wins on BrowseComp, and costs about $1.74 input / $3.48 output per million tokens versus Opus 4.7's $5.00 / $25.00. Pick Opus for top-tier coding and reasoning quality; pick V4 for near-frontier performance at a fraction of the cost or for self-hosting.
Pricing and access
The economic gap is the headline. DeepSeek publishes V4 pricing in two tiers, Pro and Flash, both with a 1M-token context window and 384K maximum output. Opus 4.7 is API-only through Anthropic.
| Model | Input ($/M tokens) | Cached input | Output ($/M tokens) | Open weights |
|---|---|---|---|---|
| Claude Opus 4.7 | $5.00 | Tier-dependent | $25.00 | No |
| DeepSeek V4-Pro | $1.74 | $0.145 | $3.48 | Yes (MIT) |
| DeepSeek V4-Flash | $0.14 | $0.028 | $0.28 | Yes (MIT) |
On a simple one-million-input plus one-million-output blend, V4-Pro lands at $5.22 versus $30.00 for Opus 4.7. With cached input, V4-Pro drops to roughly $3.63, widening the gap to about one-eighth of Opus pricing. V4-Flash is the budget extreme at $0.42 blended, which sits below nearly every commercial model on the market.
V4 weights are downloadable from Hugging Face, with V4-Pro at 865GB and V4-Flash at 160GB. Self-hosting V4-Pro at usable throughput typically requires 8×H100-class infrastructure or equivalent. Opus 4.7 has no self-host path.
Benchmark head-to-head
On directly comparable evaluations published by both companies, Opus 4.7 holds the lead on academic reasoning and software engineering, while V4-Pro-Max gets close on agentic tasks and edges ahead on web-browsing benchmarks.
| Benchmark | DeepSeek V4-Pro-Max | Claude Opus 4.7 | Lead |
|---|---|---|---|
| GPQA Diamond | 90.1% | 94.2% | Opus 4.7 |
| Humanity's Last Exam (no tools) | 37.7% | 46.9% | Opus 4.7 |
| Humanity's Last Exam (with tools) | 48.2% | 54.7% | Opus 4.7 |
| SWE-Bench Pro | 55.4% | 64.3% | Opus 4.7 |
| Terminal-Bench 2.0 | 67.9% | 69.4% | Opus 4.7 (narrow) |
| MCP Atlas | 73.6% | 79.1% | Opus 4.7 |
| BrowseComp | 83.4% | 79.3% | DeepSeek V4 |
The pattern is consistent: Opus wins academic and structured coding evaluations by 4 to 9 points, ties on terminal-style agent work, and loses on agentic web browsing. V4-Pro's published claims against older models such as Opus 4.6 and GPT-5.4 xHigh are stronger, but those don't reflect the current Anthropic flagship.
Architecture and context
Both models offer a 1-million-token context window, but they get there differently. V4-Pro is a Mixture-of-Experts model with roughly 1.6 trillion total parameters and around 49 billion active per forward pass. It uses a hybrid attention design combining Compressed Sparse Attention and Heavily Compressed Attention, which DeepSeek says cuts single-token inference FLOPs to about 27% of V3.2 and KV cache to 10% at 1M-token scale.
Opus 4.7's architecture is undisclosed. Anthropic exposes adaptive and extended thinking controls, image input, and a Claude tokenizer. V4-Pro is text-only on the base model; a separate vision variant exists but lags Opus on multimodal breadth.
| Capability | Claude Opus 4.7 | DeepSeek V4-Pro |
|---|---|---|
| Context window | 1M tokens | 1M tokens |
| Max output | 128K tokens | 384K tokens |
| Image input | Yes | No (separate vision variant) |
| Reasoning modes | Adaptive / extended thinking | Non-think, Think High, Think Max |
| Function calling | Yes | Yes |
| License | Proprietary | MIT |
| API compatibility | Anthropic SDK | OpenAI ChatCompletions and Anthropic formats |
V4's 384K maximum output is a meaningful practical edge over Opus 4.7's 128K cap for tasks that need long single-response generations, such as full-document drafts, large refactors, or extended agent traces.
Coding and agentic work
For pure coding quality on the hardest benchmarks, Opus 4.7 is the better tool. The 64.3% on SWE-Bench Pro is the leading verified result among the two, and Anthropic's models have a track record of stronger multi-file reasoning and tighter instruction adherence on complex constraint sets.
V4-Pro is competitive in the same tier as older Opus releases and handles agentic coding pipelines well, particularly with its larger output budget and MIT-licensed weights. It integrates with common agent harnesses including Claude Code and OpenCode, and DeepSeek runs its own internal coding agents on it.
For long-horizon agent loops with 50+ tool calls, Opus 4.7 still shows less drift in practice. For high-volume code review, repository indexing, or CI-attached automation where cost dominates, V4 changes the math meaningfully.
When to pick which
| Workload | Better fit | Why |
|---|---|---|
| Hardest reasoning, math, science Q&A | Claude Opus 4.7 | Leads GPQA Diamond and HLE by 4–9 points |
| Multi-file refactoring, complex SWE tasks | Claude Opus 4.7 | SWE-Bench Pro lead, stronger constraint adherence |
| Image-heavy or multimodal pipelines | Claude Opus 4.7 | Native image input on the flagship model |
| High-volume inference at scale | DeepSeek V4-Pro or Flash | 1/6 to 1/100 the cost per blended token |
| Self-hosting for privacy or compliance | DeepSeek V4-Pro | MIT-licensed open weights, on-prem deployment |
| Long-output generation (200K+ tokens) | DeepSeek V4-Pro | 384K output cap vs. Opus 128K |
| Agentic web browsing | DeepSeek V4-Pro | Beats Opus 4.7 on BrowseComp (83.4% vs. 79.3%) |
| Domain fine-tuning | DeepSeek V4-Pro | Open weights allow custom training; Opus does not |
Most published V4 benchmark scores come from DeepSeek's own technical report. Independent third-party evaluations of V4 are still accumulating, and on tests less prone to gaming, such as ARC-AGI, V4 sits notably below the latest closed frontier models. Treat single-digit benchmark gaps as directional rather than decisive.
Pricing and availability change. Both vendors publish current rates on their official documentation, and V4 is currently in preview status with model IDs deepseek-v4-pro and deepseek-v4-flash. Legacy deepseek-chat and deepseek-reasoner endpoints are scheduled for retirement on July 24, 2026.
For most teams the practical answer isn't picking one. Opus 4.7 remains the model to reach for when correctness on a hard task is worth $25 per million output tokens. V4 makes a wider class of automation economically viable, and the open weights give it a deployment surface Anthropic can't match. The competitive pressure that creates is the real story of this release cycle.