OpenAI finally released Vision for ChatGPT's Advanced Voice Mode

Suffice it to say that the whole AI space lit up with excitement when OpenAI demoed the Advanced Voice Mode back in May. As the company released its latest flagship model, GPT-4o, back then, it also showcased its incredible multimodal capabilities.

However, for months, it was nothing but a mere showcase. Even though the company had promised that they'd roll out the Advanced Voice Mode in a few weeks, it turned out to be months before access was rolled out (and not even to everyone). Even then, vision capabilities were missing. Now, finally, OpenAI is rolling out Vision in Advanced Voice Mode.

The news comes on day 6 of OpenAI's 12 days of shipping updates, where every day for the past 6 days, Sam Altman and other OpenAI employees have been releasing small and big stuff while bringing some holiday cheer to the table.

Some notable past releases include a ChatGPT Pro subscription, the full release of their o1 reasoning model, the public release of their video model, Sora, and the release of an improved Canvas to all users.

The news for Vision comes just a day after Google showcased an enhanced version of its Project Astra and a new prototype, Project Mariner, with agentic capabilities. However, OpenAI shoots ahead again as Project Astra is still not being released to the public.

With Vision capabilities in Advanced Voice Mode, users can now either share the real-time video feed from their camera or share their phone's screen with ChatGPT. Users have eagerly awaited this release because of its potential practical applications, especially for people with impaired vision.

In the simple demo shared today, OpenAI team enlisted ChatGPT's help to make pour-over coffee.

Vision in Advanced Voice Mode will only be available in the ChatGPT mobile app at the time of release. Hopefully, it'll become available on the desktop apps soon as this adds friction when you want to enlist ChatGPT's help while you're working or coding.

The feature will be rolling out to all Teams users, and Plus and Pro users everywhere in the world except the EU will also get access starting today; the rollout is expected to be completed within the week. For Edu and Enterprise users, however, the wait is a bit longer as access will come early next year.

ChatGPT's Advanced Voice Mode was undoubtedly good, but without the promised vision capabilities, it was missing a core functionality that could turn it from a fun to a useful assistant. With the release of Vision, I'm really excited to see how that would potentially change.

In the spirit of the holidays, OpenAI has also released a new Santa Mode in ChatGPT which will be available worldwide on all the platforms – mobile app, desktop app, and ChatGPT web.

Santa Mode will be available in Advanced as well as Standard Voice Mode. OpenAI will reset your Advanced Voice Mode Limit the first time you enable Santa Mode; so even if you have used up all your limit, you'll still be able to indulge in some holiday cheer. And once you run out in Advanced Voice Mode, there's always standard Santa to turn to.