Baidu unveils proprietary ERNIE 5 beating GPT-5 performance on charts, document understanding and more

6 6 minutes read

Just hours after OpenAI updated its flagship base model GPT-5 to GPT-5.1, promising less token usage overall and a more pleasant personality with more preset options, the Chinese search giant says Baidu unveiled its next-generation foundation model, ERNIE 5.0, in addition to a range of AI product upgrades and strategic international expansions.

The goal: positioning as a global competitor in the increasingly competitive business AI market.

Announced at the company’s Baidu World 2025 event, ERNIE 5.0 is a proprietary, native omnimodal model designed to collaboratively process and generate content in the form of text, images, audio and video.

Unlike Baidu’s recently released ERNIE-4.5-VL-28B-A3B-Thinking, which is open source under an enterprise-friendly and permissive Apache 2.0 license, ERNIE 5.0 is a proprietary model and only available through Baidu’s ERNIE Bot website (I had to manually select it from the model selector dropdown) and the Qianfan cloud platform application programming interface (API) for enterprise customers.

In addition to the model launch, Baidu introduced major updates to its digital human platform, no-code tools and general-purpose AI agents – all aimed at expanding its AI footprint beyond China.

The company also introduced ERNIE 5.0 Preview 1022, a variant optimized for text-intensive tasks, in addition to the general preview model that balances across modalities.

Baidu highlighted that ERNIE 5.0 represents a shift in the way intelligence is deployed at scale, with CEO Robin Li stating: “When you internalize AI, it becomes a native capability, transforming intelligence from a cost center to a source of productivity.”

Where ERNIE 5.0 surpasses GPT-5 and Gemini 2.5 Pro

ERNIE 5.0 benchmark results suggest that Baidu has achieved parity (or near parity) with the best Western foundation models across a broad spectrum of tasks.

In public benchmark slides shared at the Baidu World 2025 event, ERNIE 5.0 Preview outperformed or matched OpenAI’s GPT-5-High and Google’s Gemini 2.5 Pro in multimodal reasoning, document understanding and image-based QAwhile also demonstrating strong language modeling and code execution skills.

The company highlighted its ability to process joint inputs and outputs across modalities, rather than relying on post-hoc modality fusion, which it described as a technical differentiator.

On visual tasks, ERNIE 5.0 achieved leading scores on OCRBench, DocVQA and ChartQA, three benchmarks that test document recognition, comprehension and structured data reasoning.

Baidu claims the model beat both GPT-5-High and Gemini 2.5 Pro on these document- and chart-based benchmarks, areas it describes as core to enterprise applications such as automated document processing and financial analysis.

When generating images, ERNIE 5.0 matched or exceeded Google’s Veo3 in categories including semantic alignment and image quality, according to Baidu’s internal GenEval-based evaluation. Baidu claimed that the model’s multimodal integration allows it to generate and interpret visual content with greater contextual awareness than models that rely on modality-specific encoders.

For audio and speech tasks, ERNIE 5.0 demonstrated competitive results on the MM-AU and TUT2017 benchmarks for audio comprehension, as well as answering questions based on spoken language input. The audio performance, while not as heavily emphasized as vision or text, suggests a broad footprint intended to support full-spectrum multimodal applications.

On language tasks, the model showed strong results in following instructions, answering factual questions and mathematical reasoning – key areas that determine the business utility of large language models.

The Preview 1022 variant of ERNIE 5.0, tailored for textual performance, showed even stronger language-specific results in early developer access. While Baidu doesn’t claim broad superiority in general language reasoning, its internal evaluations suggest that ERNIE 5.0 Preview 1022 bridges the gap with top English-language models and outperforms them in Chinese-language performance.

While Baidu has not publicly released full benchmark details or raw scores, its performance positioning suggests a deliberate effort to view ERNIE 5.0 not as a niche multimodal system, but as a flagship model that competes with the largest closed models in general-purpose reasoning.

Where Baidu claims it has a clear lead is in structured document understanding, visual graph reasoning, and the integration of multiple modalities into a single, native modeling architecture. Independent verification of these results remains pending, but the breadth of claimed capabilities positions ERNIE 5.0 as a serious alternative in the multimodal foundation model landscape.

Pricing strategy for businesses

ERNIE 5.0 is located on the end of premium of Baidu’s model pricing structure. The company has released specific pricing for API usage on its Qianfan platform, bringing costs in line with other top offerings from Chinese competitors like Alibaba.

Model	Entry fees (per 1K tokens)	Output fees (per 1K tokens)	Source
ERNIE 5.0	$0.00085 (¥0.006)	$0.0034 (¥0.024)	Qianfan
ERNIE 4.5 Turbo (e.g.)	$0.00011 (¥0.0008)	$0.00045 (¥0.0032)	Qianfan
Qwen3 (coder example)	$0.00085 (¥0.006)	$0.0034 (¥0.024)	Qianfan

The contrast in cost between ERNIE 5.0 and previous models such as ERNIE 4.5 Turbo underlines Baidu’s strategy to differentiate between high-volume, low-cost models and high-capacity models designed for complex tasks and multimodal reasoning.

Compared to other American alternatives, it remains average in price:

Model	Input (/1 million tokens)	Output (/1 million tokens)	Source
GPT-5.1	$1.25	$10.00	OpenAI
ERNIE 5.0	$0.85	$3.40	Qianfan
ERNIE 4.5 Turbo (e.g.)	$0.11	$0.45	Qianfan
Claude Opus 4.1	$15.00	$75.00	Anthropic
Twin 2.5 Pro	$1.25 (≤200k) / $2.50 (>200k)	$10.00 (≤200k) / $15.00 (>200k)	Google Vertex AI Awards
Grok 4 (grok-4-0709)	$3.00	$15.00	xAI API

Global expansion: products and platforms

Simultaneously with the model release, Baidu is expanding internationally:

GenFlow 3.0now with more than 20 million users, is the company’s largest general-purpose AI agent and features improved memory and multi-modal task handling.
Knowna self-evolving agent that can solve complex problems dynamically is now commercially available by invitation.
IDothe international version of Baidu’s codeless builder Miaoda, is live worldwide via medo.dev.
Oreatea productivity workspace with support for documents, slides, images, video and podcasts, has reached more than 1.2 million users worldwide.

Baidu’s digital human platform, which has already been rolled out in Brazil, is also part of the global push. According to company data, 83% of livestreamers at this year’s “Double 11” shopping event in China used Baidu’s digital human technology, contributing to a 91% increase in GMV.

Meanwhile, Baidu’s autonomous taxi service Apollo Go has surpassed 17 million rides, operates self-driving fleets in 22 cities and claims the title of the world’s largest robotaxi network.

Open source vision language model attracts industry attention

Two days before the flagship ERNIE 5.0 event, Baidu also released an open-source multimodal model under the Apache 2.0 license: ERNIE-4.5-VL-28B-A3B-Thinking.

As reported by my colleague Michael Nuñez of VentureBeat, the model activates only 3 billion parameters while retaining a total of 28 billion, using a Mixture-of-Experts (MoE) architecture for efficient inference.

Major technical innovations include:

“Thinking with Pictures”, which enables dynamic, zoom-based visual analysis
Support for map interpretation, document understanding, visual foundation and time awareness in video
Runtime on a single 80 GB GPU, making it accessible to mid-sized organizations
Full compatibility with Transformers, vLLM and Baidu’s FastDeploy toolkits

This release increases pressure on closed source competitors. With Apache 2.0 licensing, ERNIE-4.5-VL-28B-A3B-Thinking becomes a viable base model for commercial applications without licensing restrictions – something few high-performing models in this class offer.

Community feedback and Baidu’s response

Following the launch of ERNIE 5.0, developer and AI evaluator Lisan al Gaib (@scaling01) posted a mixed review on X. While initially impressed with the model’s benchmark performance, they reported a persistent issue with ERNIE 5.0 repeatedly calling tools (even when explicitly told not to) during SVG job generation.

“ERNIE 5.0 benchmarks looked crazy until I tested it…unfortunately RL is brain damaged or has a serious problem with their chat platform/system prompt,” Lisan wrote.

Within hours, Baidu’s developer-focused support account, @ErnieforDevs, responded:

“Thanks for the feedback! It’s a known bug; certain syntax can trigger this consistently. We’re working on a fix. You can try rephrasing or changing the prompt to avoid it for now.”

The quick turnaround reflects Baidu’s increasing emphasis on communicating with developers, especially as it favors international users through both proprietary and open source offerings.

Outlook for Baidu and its foundational LLM family ERNIE

Baidu’s ERNIE 5.0 marks a strategic escalation in the global foundation model race. With performance claims that put it on par with the most advanced systems from OpenAI and Google, and a mix of premium pricing and open-access alternatives, Baidu is signaling its ambition to become not just a domestic AI leader, but a credible global infrastructure provider.

At a time when enterprise AI users are increasingly demanding multimodal performance, flexible licensing and deployment efficiency, Baidu’s dual-pronged approach – premium hosted APIs and open-source releases – can broaden its appeal within both enterprise and developer communities.

Whether the company’s performance claims hold up to third-party testing remains to be seen. But in a landscape shaped by rising costs, model complexity and computing bottlenecks, ERNIE 5.0 and its supporting ecosystem give Baidu a competitive position in the next wave of AI deployment.

Source link

Baidu unveils proprietary ERNIE 5 beating GPT-5 performance on charts, document understanding and more