Meta returns to open source AI with Omnilingual ASR models that can transcribe 1,600+ languages natively

November 11, 2025

2 4 minutes read

Meta just released a new one multilingual automatic speech recognition system (ASR). supports more than 1,600 languages – dwarfing OpenAI’s open source Whisper model, which only supports 99.

The architecture also allows developers to extend that support to thousands of others. Through a feature called zero-shot in-context learning, users can provide a pair of paired samples of audio and text in a new language at the time of inference, allowing the model to transcribe additional utterances in that language without any retraining.

In practice, this expands the potential coverage to more than 5,400 languages – roughly any spoken language with a known script.

It’s a shift from static model options to a flexible framework that communities can adapt themselves. So while the 1,600 languages reflect the official training coverage, the broader figure represents Omnilingual ASR’s ability to generalize on demand, making it the most extensible speech recognition system released to date.

Best of all: it’s open source under a regular Apache 2.0 license – not a restrictive, quasi-open-source Llama license like the company’s previous releases, which limited use by larger enterprises unless they paid a licensing fee – meaning researchers and developers have the freedom to use and implement it immediately, for free and without restrictions, even in commercial and enterprise projects!

Released on November 10 Meta’s website, Githubalong with a demo space at Knuffel Face And technical paperMeta’s omnilingual ASR suite includes a suite of speech recognition models, a multilingual audio representation model with 7 billion parameters, and a massive speech corpus covering more than 350 previously underserved languages.

All resources are available for free under open licenses and the models support out-of-the-box speech-to-text transcription.

“By open sourcing these models and datasets, we aim to break down language barriers, expand digital access, and strengthen communities around the world,” Meta wrote on its website. @AIatMeta account on X

Designed for speech-to-text transcription

At its core, Omnilingual ASR is a speech-to-text system.

The models are trained to convert spoken language into written text and support applications such as voice assistants, transcription tools, subtitles, digitization of oral archives, and accessibility features for low-resource languages.

Unlike previous ASR models that required extensive labeled training data, Omnilingual ASR includes a zero-shot variant.

This version can transcribe languages never seen before, using just a few paired samples of audio and associated text.

This dramatically lowers the barrier to adding new or endangered languages, eliminating the need for large corpora or retraining.

Model family and technical design

The omnilingual ASR suite includes multiple model families trained on more than 4.3 million hours of audio from more than 1,600 languages:

wav2vec 2.0 models for autonomous learning of speech representation (300M–7B parameters)
CTC-based ASR models for efficient supervised transcription
LLM-ASR models that combine a speech encoder with a Transformer-based text decoder for advanced transcription
LLM-ZeroShot ASR model, which allows inference time adaptation to invisible languages

All models follow an encoder-decoder design: raw audio is converted into a language-independent representation and then decoded into written text.

Why scale matters

While Whisper and similar models have advanced ASR capabilities for global languages, they fall short when it comes to human language diversity. Whisper supports 99 languages. Meta’s system:

Directly supports more than 1,600 languages
Can generalize to more than 5,400 languages using in-context learning
Achieves character error rates (CER) of less than 10% in 78% of supported languages

Supported languages include more than 500 languages that have never before been covered by an ASR model, according to Meta’s research paper.

This expansion opens up new possibilities for communities whose languages are often excluded from digital tools

Here is the revised and expanded background section, integrating the broader context of Meta’s 2025 AI strategy, leadership changes, and the reception of Llama 4, complete with quotes and links in the text:

Background: Meta’s AI overhaul and a rebound from Llama 4

The release of Omnilingual ASR comes at a pivotal time in Meta’s AI strategy, following a year marked by organizational turbulence, leadership changes and uneven product execution.

Multilingual ASR is the first major open source model release since the rollout of Llama 4, Meta’s latest major language model, which debuted in April 2025 to mixed and ultimately poor reviews, with little corporate adoption compared to Chinese competitors of open source models.

The failure prompted Meta founder and CEO Mark Zuckerberg to appoint Alexandr Wang, co-founder and former CEO of AI data provider Scale AI. as Chief AI Officerand start one extensive and costly staff recruitment shocking AI and business eye-watering pay packages for top AI researchers.

Omnilingual ASR, on the other hand, represents a strategic and reputational reset. It brings Meta back to an area the company has historically led – multilingual AI – and offers a truly extensible, community-focused stack with minimal barriers to entry.

The system’s support for more than 1,600 languages and its extensibility to more than 5,000 more via zero-shot in-context learning confirm Meta’s technical credibility in language technology.

Importantly, it does this through a free and permissioned release, under Apache 2.0, with transparent dataset sourcing and reproducible training protocols.

This shift aligns with broader themes in Meta’s 2025 strategy. The company has reoriented its story around a “personal superintelligence” vision, investing heavily in infrastructure (including a September release of custom AI accelerators and Arm-based inference stacks) source while downplaying the metaverse in favor of fundamental AI capabilities. The return to public training data in Europe after a regulatory pause also underlines Europe’s intent to compete globally despite privacy controls source.

So, all-sided ASR is more than a model release: it is a calculated move to reassert control over the narrative: from the fragmented rollout of Llama 4 to a highly actionable, research-based contribution that aligns with Meta’s long-term AI platform strategy.

Community-based dataset collection

To reach this scale, Meta collaborated with researchers and community organizations in Africa, Asia, and elsewhere to create the Omnilingual ASR Corpus, a 3,350-hour dataset in 348 low-resource languages. The contributors were compensated local speakers and the recordings were collected in collaboration with groups such as:

African following voices: A Gates Foundation-supported consortium consisting of Maseno University (Kenya), University of Pretoria and Data Science Nigeria
The collective voice of the Mozilla Foundationsupported through the Open Multilingual Speech Fund
Lanfrica / NaijaVoiceswhich created data for 11 African languages including Igala, Serer and Urhobo

Data collection focused on natural, unscripted speech. Prompts are designed to be culturally relevant and open-ended, such as “Is it better to have a few close friends or many casual acquaintances? Why?” The transcriptions used established writing systems, with quality assurance built into every step.

Performance and hardware considerations

The largest model in the suite, the omniASR_LLM_7B, requires ~17GB of GPU memory for inference, making it suitable for deployment on high-end hardware. Smaller models (300M–1B) can run on lower power devices and deliver real-time transcription speeds.

Performance benchmarks show strong results even in low-resource scenarios:

CER <10% in 95% of high and medium source languages
CER <10% in 36% of low-resource languages
Robustness in noisy conditions and invisible domains, especially with fine tuning

The zero-shot system, omniASR_LLM_7B_ZS, can transcribe new languages with minimal settings. Users provide a few examples of audio-text pairs, and the model generates transcriptions for new utterances in the same language.

Open Access and developer tools

All models and the dataset are licensed under permitted conditions:

Apache 2.0 for models and code
CC BY 4.0 for the Multilingual ASR corpus on HuggingFace

Installation is supported via PyPI and uv:

pip install omnilingual-asr

Meta also offers:

A HuggingFace dataset integration
Pre-built inference pipelines
Language code conditioning for improved accuracy

Developers can view the full list of supported languages using the API:

from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs

print(len(supported_langs)) print(supported_langs)

Broader implications

Multilingual ASR reformulates the language coverage in ASR from a fixed list to one extensible framework. It makes the following possible:

Community-driven inclusion of underrepresented languages
Digital access for oral and endangered languages
Research into speech technology in linguistically diverse contexts

Crucially, Meta emphasizes ethical considerations throughout, advocating open source participation and collaboration with native speaking communities.

“No model can ever anticipate in advance and include all the world’s languages,” the Omnilingual ASR article states, “but Omnilingual ASR allows communities to extend recognition with their own data.”

Access to the Tools

All resources are now available at:

Code + Models: github.com/facebookresearch/omnilingual-asr
Dataset: Huggingface.co/datasets/facebook/omnilingual-asr-corpus
Blog post: ai.meta.com/blog/omnilingual-asr

What this means for companies

For business developers, especially those operating in multilingual or international markets, Omnilingual ASR significantly lowers the barrier to deploying speech-to-text systems to a broader range of customers and regions.

Instead of relying on commercial ASR APIs that support only a limited number of resource-heavy languages, teams can now integrate an open source pipeline that includes more than 1,600 languages out of the box, with the option to expand to thousands of additional languages via zero-shot learning.

This flexibility is especially valuable for companies operating in industries such as voice customer support, transcription services, accessibility, education or civil technology, where local language coverage may be a competitive or regulatory requirement. Because the models are released under the permissive Apache 2.0 license, companies can refine, deploy, or integrate them into proprietary systems without restrictive conditions.

It also represents a shift in the ASR landscape: from centralized, cloud-gated offerings to community-extensible infrastructure. By making multilingual speech recognition more accessible, customizable and cost-effective, Omnilingual ASR opens the door to a new generation of business speech applications built around linguistic inclusion rather than linguistic limitation.

Source link

Meta returns to open source AI with Omnilingual ASR models that can transcribe 1,600+ languages natively