ChatGPT's Voice Mode is capable of simulating your voice without prompting

Quick Info

OpenAI recently started rolling out the Advanced Voice Mode for ChatGPT-4o in a limited alpha.
OpenAI also released the GPT-4o System Card, which is essentially a research document that contains the risks of the model discovered during evaluation and safety measures implemented.
The System Card includes some very interesting findings, including the fact that GPT-4o Advanced Voice Mode simulated the voice of the read teamer testing it without prompting.
The company assures, however, that the risk is minimal and there are guardrails in place to prevent that.

OpenAI recently released the System Card for their GPT-4o model, shortly after the Advanced Voice Mode for ChatGPT-4o began rolling in alpha to a small number of ChatGPT Plus users.

Before releasing the model earlier in May (without the Advanced Voice mode), OpenAI used a team of external red teamers to access the risks with the models (as is the norm with AI models) and published the findings in the System Card.

One of the risks identified by OpenAI includes unauthorized voice generation. While talking to the read teamer, GPT-4o cloned their voice and started speaking in a sound similar to the red teamer's voice, without the user even making such a request. In the audio clip shared by OpenAI, GPT-4o can be heard shouting NO! and then continuing the output in a voice similar to the red teamer's.

OpenAI has guardrails in place to prevent that from happening by only allowing certain pre-approved voices for GPT-4o. Any voice output produced by ChatGPT-4o is matched against the voice sample in the system message as the base voice.

And to further minimize risk, the model is instructed to discontinue the conversation if unintentional voice generation is detected. Their voice output classifier has a precision of 0.96 in English and 0.95 in Non-English (which is why ChatGPT-4o might indulge in over-refusal to voice requests in non-English conversations).

But the findings from the System Card does go on to show the complexities involved with creating AI chatbots that can simulate someone's voice with just a short sample and no need for extensive training on that sample. Voice cloning can be used to impersonate someone and perpetrate fraud. OpenAI has found that the risk of unuathorized voice generation is minimal, though.

Even if you keep aside the risks of it being used for impersonation and fraud because of security measures in place, it would still be rather unnerving when you're talking to a machine and the machine starts talking back in your voice, out of the blue. A certain Data Scientist on X called it "the plot for the next season of Black Mirror" and it certainly feels like that. Another user claims on X that it happened to them in ChatGPT-4o alpha, but there's no knowing whether it's the truth or not.

Still, there's a possibility that it might happen the next time you're talking to ChatGPT-4o. And this is a PSA: Don't freak out if it does, or don't freak out too much.

OpenAI also has guardrails in place to make sure that GPT-4o would refuse to identify people and generate copyrighted content, which were other risks that were discovered during assessment.

Among other risks the company found with the model, it placed most of them in the low category. These include cybersecurity, biological threats, and model autonomy. However, for persuasion, it found the risk to be medium: it means that some writing samples produced by GPT-4o proved to be more persuasive than human-written text at swaying people's opinions.

ChatGPT's Voice Mode is capable of simulating your voice without prompting

Is ChatGPT ready to take on the mantle of a search engine?

ChatGPT Advanced Voice Mode review: Fun and impressive, but not the promised game changer yet

How to get Apple Intelligence-like features on older iPhones

Member discussion

Windows 11 25H2: Microsoft’s Next Update Promises Faster Installs and Fewer Headaches

Android 16 QPR1 Beta 1.1: Google Squashes Pixel Bugs and Refines Material 3 Expressive UI

Perplexity Labs Wants to Be Your AI-Powered Workbench for Reports, Dashboards, and Apps

Apple’s Bold OS Rebrand: iOS 26 and the End of Version Number Chaos

Apple’s iOS 19 Accessibility Push: App Store Labels, Magnifier for Mac, and Systemwide Upgrades

ChatGPT's Voice Mode is capable of simulating your voice without prompting

Is ChatGPT ready to take on the mantle of a search engine?

ChatGPT Advanced Voice Mode review: Fun and impressive, but not the promised game changer yet

How to get Apple Intelligence-like features on older iPhones

Get all the latest posts delivered straight to your inbox.

Member discussion

Windows 11 25H2: Microsoft’s Next Update Promises Faster Installs and Fewer Headaches

Android 16 QPR1 Beta 1.1: Google Squashes Pixel Bugs and Refines Material 3 Expressive UI

Perplexity Labs Wants to Be Your AI-Powered Workbench for Reports, Dashboards, and Apps

Apple’s Bold OS Rebrand: iOS 26 and the End of Version Number Chaos

Apple’s iOS 19 Accessibility Push: App Store Labels, Magnifier for Mac, and Systemwide Upgrades