“New math” is a loaded phrase. In this case, it refers to a claim that GPT‑5 Pro generated a correct, previously unpublished proof that improved a known bound in a convex optimization result — moving a constant from 1/L
to 1.5/L
under the same assumptions. The claim comes from an OpenAI researcher’s social post describing how the model “thought” for about 17 minutes and produced the result. It’s sparked excitement and skepticism in equal measure.
GPT-5 just casually did new mathematics.
— VraserX e/acc (@VraserX) August 20, 2025
Sebastien Bubeck gave it an open problem from convex optimization, something humans had only partially solved. GPT-5-Pro sat down, reasoned for 17 minutes, and produced a correct proof improving the known bound from 1/L all the way to… https://t.co/KNeZv7jr7d pic.twitter.com/QJ3pdZKtzH
What was allegedly proved?
The task involved a result in convex optimization — a field that studies how to efficiently minimize “well‑behaved” functions. Many guarantees in this area depend on smoothness constants, often written as L
. A common theme is proving how large a “safe” step size can be (think of how far you can move with gradient descent and still make reliable progress).
According to the claim, GPT‑5 Pro tightened an existing guarantee by improving the constant factor from 1/L
to 1.5/L
without changing the underlying assumptions. In plain terms: the model proposed a mathematically valid way to let an algorithm take bigger steps while still staying within the rules. That would qualify as a nontrivial refinement — not a grand unification, but not trivial arithmetic either.
Was it really “new” — and is it the best result?
This is where nuance matters. The improvement to 1.5/L
was described as not copied from anywhere and checked by a domain expert. However, subsequent human work reached an even tighter bound of 1.75/L
in a later version of the same research line. The timing and framing are part of the debate: some observers note that the stronger human result appeared publicly earlier than the social post implied, while supporters counter that GPT‑5’s proof strategy differed and therefore still counts as novel.
Two key points can both be true: the model’s proof may be correct and original in its technique, and it may not be the strongest known bound. In mathematics, novelty is about whether a specific argument or result was previously written down, not whether it’s the best possible bound forever.
How could an LLM do this?
OpenAI positions GPT‑5 as a major upgrade in reasoning, including math. In its launch materials, the company says GPT‑5 sets new marks on math benchmarks and “knows when to respond quickly and when to think longer,” thanks to a unified system that routes harder problems to a deeper reasoning mode. Those claims are laid out in OpenAI’s GPT‑5 announcement and on the product page, which describe improvements in correctness, reduced hallucinations, and higher scores on contests like AIME and GPQA.
Even with those improvements, a large language model isn’t “doing math” like a human. It’s producing symbolic arguments by recombining patterns it has learned, sometimes with extended “thinking” passes to structure multi‑step reasoning. That can be enough to assemble a valid proof in a specific niche — especially when the model is guided by a recent paper and a clear, constrained ask such as “improve the constant under the same assumptions.”
What makes the claim controversial?
Three things:
- Provenance. It’s straightforward to verify whether a proof is correct; it’s much harder to verify that a model produced it independently, without heavy human steering or retrieval from somewhere obscure.
- Timeline and framing. Critics argue the social post overstated the “frontier‑advancing” nature of the result because humans already had a stronger bound. Supporters say the model’s proof was still new and materially different.
- Reproducibility. Some people report that asking current models to replicate the feat yields wrong or inconsistent answers. One‑offs happen; reliable capability is what ultimately matters.
How to evaluate claims like this
If you’re trying to make sense of similar “AI did new math” announcements, apply the same filters working mathematicians do:
- Check the proof. If it’s public, specialists can verify correctness quickly. Subtle errors are common in automatically generated arguments, so outside review matters.
- Compare to prior art. Was the same bound (or a better one) already proved? If so, is the new proof technique meaningfully different?
- Test reproducibility. Can the result be recovered with the same prompt and model? Does it generalize to similar problems, or is it a one‑off?
- Demand transparency. Useful evidence includes the exact prompt, whether the model had tool access or web search enabled, and the model/version used.
- Separate “novel” from “useful.” A proof can be technically new yet incremental; it can also be weaker than existing results and still be interesting if the method is fresh.
Have AIs contributed to math and algorithms before?
Yes — though often not as pure text‑only LLMs. Google DeepMind has showcased systems that discover or refine algorithms with agentic loops and verification in the mix. For example, the company detailed how a Gemini‑powered agent called AlphaEvolve iteratively designs algorithmic ideas that are then rigorously checked, improved, and selected by a surrounding system; you can read the approach on DeepMind’s official blog. These pipelines look less like a chat transcript and more like a lab: generate ideas, verify, repair, repeat.
So, is this a big deal?
If GPT‑5 Pro indeed produced a correct, unpublished strengthening of a known result in minutes, that’s noteworthy — not because it “invented math,” but because it hints at a practical role for LLMs as tireless research assistants that tighten constants, try alternate proof paths, and surface overlooked refinements on demand. OpenAI’s own materials emphasize a push toward better reasoning and fewer hallucinations, with GPT‑5 routing hard tasks into extended thinking and scoring higher on math benchmarks than prior models, as outlined in its launch post.
The caveat is the same as always: one impressive anecdote is not a capability. What will matter is whether independent researchers can routinely elicit comparable, verifiably correct, and clearly novel results — ideally with fully disclosed prompts and settings.
Key terms, briefly explained
Convex optimization
: A branch of optimization where the objective function has a single global minimum and no deceptive local minima, enabling strong theoretical guarantees.L-smoothness (L)
: A condition that bounds how quickly the gradient can change; it constrains safe step sizes for methods like gradient descent.Bound tightening
: Improving a constant (e.g., from1/L
to1.5/L
) in a theorem while keeping the same assumptions — often delicate, sometimes impactful.Reasoning mode
: In GPT‑5, a deeper “think longer” path that the system can route to for complex tasks; it’s part of the unified design described by OpenAI in its official GPT‑5 page.
Bottom line: claims that a model did “new math” should be interrogated — but not dismissed out of hand. With GPT‑5, OpenAI is explicitly targeting harder reasoning and math performance in a system that can decide when to spend more compute on a problem. If reliable workflows emerge where a model proposes candidate proofs and humans or tools verify them, tightening constants and exploring proof variants may become a productive human‑AI collaboration — less “replacing mathematicians,” more giving them a faster way to test ideas.
Member discussion