DeepL, known for text translation, now wants to translate your voice

DeepL, a translation company best known for its text tools, today released a speech-to-voice translation suite that covers use cases such as meetings, mobile and web calls, and group conversations for frontline workers through custom apps. The company is also releasing an API that allows third-party developers and companies to build on top of DeepL’s technology for custom use cases, such as call centers.
“After spending so many years in text translation, voice was a logical step for us,” DeepL CEO Jarek Kutylowski told TechCrunch in an interview. “We’ve come a long way when it comes to text translation and document translation. But we didn’t think there was a great product for real-time voice translation.”
Kutylowski said the challenges of creating a real-time translation product center on finding a balance between reducing latency (the delay between someone speaking and the translated audio playing) and maintaining accurate results.
DeepL releases add-ons for platforms like Zoom and Microsoft Teams, where listeners can hear real-time translations while others speak in their native language, or follow real-time translated text on screen. This program is currently under early access and the company is inviting organizations to place themselves on a waiting list. The company also has a product for mobile and web-based conversations that can take place in person or remotely.
DeepL also allows users to join a group conversation in a setting such as training sessions or workshops, where participants can join via a QR code.
DeepL said its voice-to-voice technology can also learn and adapt to custom vocabulary, such as industry-specific terms and company and personal names.
Kutylowski said AI is reimagining what customer service will look like in the coming years. He noted that a translation layer helps companies provide support in languages where qualified personnel are scarce and expensive to hire.
WAN event
San Francisco, CA
|
October 13-15, 2026
The company said it manages the entire voice-to-voice stack. However, the current system converts speech to text, applies translation, and then converts it back to speech. DeepL believes that because it has been working on text translation for years, it has an edge in translation quality. In the future, the company wants to develop an end-to-end voice translation model that completely skips the text step.
DeepL faces competition from several well-funded startups working in adjacent corners of the space. Sanas, which raised $65 million last year from Quadrille Capital and Teleperformance, uses AI to adjust a speaker’s accent in real time – a tool primarily aimed at call center agents.
Dubai-based Camb.AI focuses on speech synthesis and translation for media and entertainment companies Amazon Web Services, helping them copy and localize video content at scale.
Palabra, backed by Reddit co-founder Alexis Ohanian’s firm Seven Seven Six, is building a real-time speech translation engine designed to preserve both meaning and the speaker’s original voice, putting it in more direct competition with what DeepL is building now.




