OpenAI unveils o3 reasoning models, early 2025 release targeted

OpenAI unveiled its next generation of reasoning AI models, o3 and o3-mini, on Friday, marking the conclusion of its "12 Days of OpenAI" event. These models build upon the o1 series released earlier this year, introducing enhanced capabilities for complex problem-solving.

The company is initiating safety testing with selected researchers, with plans to release the o3-mini model by the end of January and the full o3 model shortly after. Applications for early access are open until January 10, 2025.

According to OpenAI CEO Sam Altman, the o3 models represent "the beginning of the next phase of AI." He highlighted their ability to tackle increasingly complex tasks that require a high degree of logical reasoning. The new models employ a “private chain of thought,” which allows them to pause and consider various prompts before responding.

"We view this as the beginning of the next phase of AI, where you can use these models to do increasingly complex tasks that require a lot of reasoning."— Sam Altman, CEO of OpenAI

The o3 model family introduces a new feature allowing users to adjust reasoning time, offering low, medium, and high compute settings. The higher the compute setting, the more time the model spends “thinking” through the problem, resulting in better performance on a task. Although this process adds latency, it leads to greater accuracy, especially in areas like physics, science, and math.

The company skipped the "o2" name, according to reports, to avoid trademark conflicts with the British telecom company O2. During a livestream, Altman confirmed the naming issue, joking about OpenAI's "tradition of being truly bad at names."

OpenAI claims the o3 models have shown significant improvements on benchmarks. On the ARC-AGI benchmark, the o3 model achieved a score of 87.5% in high-compute mode, tripling the performance of o1. It also scored 96.7% on the 2024 American Invitational Mathematics Exam, missing only one question. Additionally, o3 achieved 87.7% on GPQA Diamond, a set of graduate-level biology, physics, and chemistry questions. On the Frontier Math benchmark by EpochAI, o3 solved 25.2% of problems, surpassing the previous high score of 2%.

The o3-mini model is a smaller, distilled version designed for specific tasks, and it is also said to be four times faster end-to-end than the o1-mini when accounting for reasoning tokens.

OpenAI also detailed its use of "deliberative alignment" for both the o1 and o3 models. This technique embeds human-written safety specifications into the models, enabling them to reason about these policies before generating responses. This is intended to prevent harmful outputs and improve overall safety, according to the company.

While these results are promising, François Chollet, the creator of the ARC-AGI benchmark, cautioned that o3 is not yet indicative of Artificial General Intelligence (AGI). He noted that the model still fails on some simple tasks. According to Chollet, "You’ll know AGI is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible."

The release of the o3 models comes amid increased competition in the AI industry. Other companies, such as Google with its Gemini model, are also developing advanced reasoning AI. OpenAI emphasizes its commitment to safety as its models become increasingly powerful. The company plans to continue developing and testing these models with external researchers before wider availability.

According to OpenAI, the application process for researchers closes on January 10, 2025.