AI

Zoom says it aced AI’s hardest exam. Critics say it copied off its neighbors.

Zoom video communicationthe company best known for keeping remote workers connected during the pandemic announced last week that it had achieved the highest score ever on one of the most demanding tests of artificial intelligence — a claim that sent a wave of surprise, skepticism and genuine curiosity through the tech industry.

The San Jose-based company said its AI system scored 48.1 percent on the The final exam of humanitya benchmark designed by subject matter experts around the world to beat even the most advanced AI models. That result exceeds Google’s Twin 3 Prowhich kept the previous record at 45.8 percent.

“Zoom achieved a new state-of-the-art result on the challenging Humanity’s Last Exam full-set benchmark, scoring 48.1%, which represents a substantial 2.3% improvement over the previous SOTA result,” wrote Xuedong Huang, Zoom’s chief technology officer, in a blog postT.

The announcement raises a provocative question that has been on the minds of AI watchers for days: How did a videoconferencing company — one with no public history of training large language models — suddenly leapfrog by Googling, Open AIAnd Anthropic on a benchmark built to measure the limits of machine intelligence?

The answer reveals as much about where AI is going as it does about Zoom’s own technical ambitions. And depending on who you ask, it’s either an ingenious demonstration of practical engineering or an empty claim that takes credit for the work of others.

How Zoom built an AI traffic controller instead of training its own model

Zoom hasn’t trained its own large language model. Instead, the company developed what it called a ‘federated AI approach” – a system that routes queries to multiple existing models from OpenAI, Google and Anthropic, then uses proprietary software to select, combine and refine their results.

The core of this system is what Zoom is’Z scorer”, a mechanism that evaluates responses from different models and chooses the best one for a given task. The company couples this with what it describes as an “explore-verify-federate strategy”, an agentic workflow that balances exploratory reasoning with verification across multiple AI systems.

“Our federated approach combines Zoom’s proprietary small language models with advanced open-source and closed-source models,” Huang wrote. The framework “orchestrates diverse models to generate, challenge, and refine reasoning through dialectical collaboration.”

In simpler terms: Zoom built an advanced traffic controller for AI, not the AI ​​itself.

This distinction is hugely important in an industry where bragging rights – and billions in valuation – often depend on who can claim the most capable model. The major AI labs spend hundreds of millions of dollars training frontier systems on massive computer clusters. Zoom’s performance, on the other hand, seems to rely on a smart integration of those existing systems.

See also  Mistral board member and a16z VC Anjney Midha says DeepSeek won’t stop AI’s GPU hunger

Why AI researchers are divided on what counts as real innovation

The response from the AI ​​community was swift and sharply divided.

Max Rumpfan AI engineer who says he has trained state-of-the-art language models posted sharp criticism on social media. “Zoom tied together API calls to Gemini, GPT, Claude et al and made a small improvement over a benchmark that doesn’t deliver value to their customers,” he wrote. “They then claim SOTA.”

Rumpf himself did not reject the technical approach. Using multiple models for different tasks, he noted, is “actually quite smart and most applications should do this.” He pointed to Sierra, an AI customer service company, as an example of this effectively executed multi-model strategy.

His objection was more specific: “They did not train the model, but glossed over this fact in the tweet. The injustice of claiming the work of others runs deep in people.”

But other observers saw the performance differently. Hongcheng Zhua developer, gave a more measured assessment: “To top an AI evaluation, you most likely need model federation, like Zoom did. An analogy is that every Kaggle competitor knows that you have to combine models to win a competition.”

The comparison with Kaggle – the competitive data science platform where combining multiple models is standard practice among winning teams – reframes Zoom’s approach as industry best practice rather than sleight of hand. Academic research has long shown that ensemble methods routinely outperform individual models.

Yet the debate revealed a fault line in the industry’s interpretation of progress. Ryan Preamfounder of Exoria AI, was dismissive: “Zoom just creates a harness around another LLM and reports that. It’s just noise.” Another commenter captured the sheer unexpectedness of the news: “That the videoconferencing app ZOOM developed a SOTA model that achieved 48% HLE was not on my bingo card.”

Perhaps the sharpest criticism concerned priorities. Rumpf argued that Zoom could have focused its resources on the problems its customers actually face. “Retrieving interview transcripts is not ‘solved’ by SOTA LLMs,” he wrote. “I think Zoom users care a lot more about this than they do about HLE.”

The Microsoft veteran who stakes his reputation on a different kind of AI

While Zoom’s benchmark result seemed to come out of nowhere, that wasn’t the case for the Chief Technology Officer.

See also  Alleged 'scream' mask gangbang teacher of the lawyer asks psych exam, judge signs

Xuedong Huang joins Zoom from Microsoft, where he spent decades building the company’s AI capabilities. He founded Microsoft’s speech technology group in 1993 and led teams that achieved what the company described as human parity in speech recognition, machine translation, natural language understanding and computer vision.

Huang has a Ph.D. in electrical engineering from the University of Edinburgh. He is an elected member of the National Academy of Engineering and the American Academy of Arts and Sciencesas well as a colleague of both IEEE and the ACM. His credentials place him among the most talented AI managers in the industry.

Its presence at Zoom signals that the company’s AI ambitions are serious, even if its methods differ from the research labs that dominate the headlines. In his tweet celebrating the benchmark result, Huang described the achievement as a validation of Zoom’s strategy: “We’ve unlocked stronger capabilities in exploration, reasoning, and collaboration across multiple models, pushing past the performance limits of each individual model.”

That last clause – ‘exceeding the performance limits of each individual model’ – is perhaps the most important. Huang isn’t claiming that Zoom has built a better model. He claims Zoom has built a better system for using models.

Inside the test designed to beat the smartest machines in the world

The benchmark at the heart of this controversy, The final exam of humanityis designed to be exceptionally difficult. Unlike previous tests where AI systems learned to game through pattern matching, HLE presents problems that require real understanding, multi-step reasoning, and the synthesis of information across complex domains.

The exam is based on questions from experts around the world, ranging from advanced mathematics to philosophy to specialized scientific knowledge. A score of 48.1 percent may sound unimpressive to anyone used to school grades, but in the context of HLE it represents the current ceiling of machine performance.

“Developed by subject matter experts worldwide, this benchmark has become a critical benchmark for measuring AI’s progress toward human-level performance on challenging intellectual tasks,” Zoom’s announcement noted.

The company’s 2.3 percentage point improvement over Google’s previous record may seem modest in itself. But in competitive benchmarking, where gains are often only a fraction of a percent, such a jump attracts attention.

What Zoom’s approach reveals about the future of enterprise AI

Zoom’s approach has implications that extend far beyond the benchmark rankings. The company demonstrates a vision of enterprise AI that is fundamentally different from the model-centric strategies pursued by the company Open AI, AnthropicAnd Googling.

See also  Automated Hiring Software: Why Businesses Use It

Instead of betting everything on building the most capable model, Zoom is positioning itself as an orchestration layer: a company that can integrate the best capabilities from multiple providers and deliver them through products that companies already use every day.

This strategy hedges against a critical uncertainty in the AI ​​market: no one knows which model will be best next month, let alone next year. By building an infrastructure that can switch between providers, Zoom avoids vendor lock-in while theoretically providing customers with the best available AI for any given task.

The announcement of GPT-5.2 from OpenAI the next day underscored this dynamic. OpenAI’s own communications mentioned Zoom as a partner that had evaluated the new model’s performance “across their AI workloads and saw measurable gains across the board.” In other words, Zoom is both a customer of the frontier labs and now a competitor on their benchmarks – using their own technology.

This arrangement may prove sustainable. The major model providers have every incentive to sell API access at scale, even to companies that could bundle their output. The more interesting question is whether Zoom’s orchestration capabilities truly constitute intellectual property or merely advanced, high-speed engineering that others could replicate.

The real test will come when Zoom’s 300 million users start asking questions

Zoom called its industry relations announcement section “A collaborative future‘, and Huang made gratitude everywhere. “The future of AI is collaborative, not competitive,” he wrote. ‘By combining the best innovations from across the sector with our own research breakthroughs, we create solutions that are greater than the sum of their parts.’

This setup positions Zoom as a benevolent integrator, bringing together the industry’s best work for the benefit of enterprise customers. Critics see something different: a company claiming the prestige of an AI lab without doing the fundamental research it deserves.

The debate will likely not be settled by rankings, but by products. When AI companion 3.0 reaches Zoom’s hundreds of millions of users in the coming months, they will make their own judgments — not on benchmarks they’ve never heard of, but on whether the meeting summary actually captured what mattered, whether the action items made sense, whether the AI ​​saved or wasted their time.

Ultimately, Zoom’s most provocative claim may not be that it tops the benchmark. It may be the implicit argument that in the age of AI, the best model is not the one you build, but the one you know how to use.

Source link

Back to top button