OpenAI launches new voice intelligence features in its API

OpenAI said Thursday that the API will now include a number of new voice intelligence features designed to help developers create apps that can talk, transcribe and translate conversations with users.
The company new GPT‑Realtime‑2 is another voice model, built to create a realistic vocal simulation that can interact with users. However, unlike its predecessor (GPT-Realtime-1.5), this one is built with GPT-5 class reasoning, which OpenAI says was created to handle more complicated user requests.
The company is also launching GPT-Realtime-Translate, which, as it sounds, is designed to provide real-time translation services that conversationally “keep pace” with the user. The position includes more than 70 input languages (that is, the languages it can understand) and 13 output languages (the languages it passes to the speaker).
Finally, the company has also launched a new transcription capability, GPT-Realtime-Whisper, which provides users with live speech-to-text capabilities that are captured as interactions occur.
“Together, the models we are launching move real-time audio from simple call-and-response to voice interfaces that can actually work: listen, reason, translate, transcribe and take action as a conversation unfolds,” the company said.
Who are these updates good for? Companies looking to expand their customer service capabilities are an obvious target. However, OpenAI also notes that the new features will help in a wide range of areas, including education, media, events, and creator platforms, among others.
As useful as these tools seem from a business perspective, it also seems likely that they can be abused. The company said it has built guardrails to prevent its new features from being misused to create spam, fraud or other forms of online abuse. Certain triggers are embedded into the system so that “conversations can be stopped if they are determined to violate our harmful content guidelines,” according to OpenAI.
All new voice models are included OpenAI’s real-time API. Translate and Whisper are billed per minute, while GPT-Realtime-2 is billed based on token usage.
When you make a purchase through links in our articles, we may earn a small commission. This does not affect our editorial independence.




