Phonely’s new AI agents hit 99% accuracy—and customers can’t tell they’re not human

4 hours ago

0 0 7 minutes read

Become a member of the event that is trusted by business leaders for almost two decades. VB Transform brings together the people who build the real Enterprise AI strategy. Leather

A three -way partnership between AI telephone support company FairlyInference -Optimization platform Maitaiand chip maker Course has achieved a breakthrough that tackles one of the most persistent problems of the conversation -artificial intelligence: the uncomfortable delays that immediately signal to callers they talk to a machine.

The collaboration has enabled the folely to reduce response times by more than 70%, while at the same time increases the accuracy of 81.5% to 99.2% over four model iterations, which exceeds GPT-4Os 94.7% Benchmark With 4.5 percentage points. The improvements come from the new possibility of Groq to switch immediately between multiple specialized AI models without added latency, orchestrated via the Maitai optimization platform.

The performance dissolves what experts from the industry call “creepy valley“Voice AI-the subtle signals that make automated conversations clearly feel non-human. For call centers and customer service activities, the implications can be transforming: one of Fonely customers only replaces 350 human agents this month.

Why AI telephone still sound robot-like: the four seconds problem

Traditional large language models such as OpenAi’s GPT-4O I have been struggling for a long time with what seems to be a simple challenge: responding quickly enough to maintain the natural flow of conversation. Although a few seconds of delay hardly registers in text -based interactions, the same break feels endless during live telephone conversations.

“One of the things that most people do not realize is that large LLM providers, such as OpenAi, Claude and others, have a very high degree of latency variation,” said Will Bodewes, founder and CEO of Phonely, in an exclusive interview with Venturebeat. “4 seconds feel like an eternity when you talk a voice on the phone with a voice this delay is what most voice AI does not feel human today.”

The problem occurs about once every ten applications, which means that standard conversations inevitably include at least one or two uncomfortable breaks that immediately reveal the artificial nature of the interaction. For companies that consider AI telephone agents, these delays have created an important obstacle to adoption.

“This kind of latency is unacceptable for real -time telephone support,” Bodewes explained. “Apart from latency, conversation and human reactions is something that Legacy LLM providers simply did not cracked in the Stamdom.”

How three Startups AI’s biggest conversation -Prolve out

The solution has emerged from the development of groq of what the company calls “zero-Latence Lora Hotwapping“The possibility to switch immediately between several specialized AI model variants without any performance-related. Lora, or adjustment with a low rank, allows developers to create lightweight, task-specific changes to existing models instead of training completely new.

“Groq’s combination of fine-grained software-controlled architecture, fast on-chip memory, streaming architecture and deterministic version means that it is possible to gain access to multiple hot-swappy Loras without a latency penalty,” Chelsey Kanturebeat, Chief Marketing Officer of Groq. “The Loras are stored and managed in Sram in addition to the original model weights.”

This infrastructure forecast enabled Maitai to create what founder Christian Dalsanto describes as a “proxy -layer orchestration” system that continuously optimizes model performance. “Maitai acts as a thin proxy layer between customers and their model providers,” said Dalsanto. “This enables us to dynamically select and optimize the best model for each request, whereby we are automatically applied evaluation, optimisations and resilience strategies such as fallbacks.”

The system works by collecting performance data from any interaction, identifying weaknesses and improving the models iteratively without customer intervention. “Because Maitai is in the middle of the inference flow, we collect strong signals that identify where models overperform,” Dalsanto explained. “These ‘soft spots’ are clustered, labeled and gradually refined to tackle specific weaknesses without causing regressions.”

From 81% to 99% accuracy: the figures behind the human -like breakthrough of AI

The results show significant improvements between multiple performance dimensions. Time to the first token – how quickly an AI starts to respond – 73.4% from 661 milliseconds to 176 milliseconds on the 90th percentile. The total completion times fell 74.6% from 1,446 milliseconds to 339 milliseconds.

Perhaps even more important, accuracy improvements followed a clear upward route over four model iterations, starting at 81.5% and reached up to 99.2% – a level that exceeds human performance in many scenarios for customer service.

“We have seen around 70%+ of the people who call our AI to be unable to distinguish the difference between a person,” Bodewes told Venturebeat. “Latency is, or was, the dead giveaway that was an AI. With a tailor -made model that talks like a person and hardware with a superlatence, there is not much to stop us to cross the creepy valley of sounding completely human.”

The performance buyers translate directly into business results. “One of our largest customers saw an increase of 32% in qualified leads compared to an earlier version using earlier state-of-the-art models,” Bodewes noted.

350 human agents replaced in one month: call centers go all-in AI

The improvements come when call centers are confronted with the increasing pressure to reduce costs while retaining the service quality. Traditional human agents require training, planning coordination and considerable overhead costs that can eliminate AI agents.

“On -call centers see really enormous benefits of using Fonely to replace human agents,” said Bodewes. “One of the call centers with which we collaborate has actually been completely replaced by 350 human agents this month.

The technology shows special strength in specific usage scenarios. “Pay attention to a few areas, including leading performance in the appointment planning and head qualification, specifically, further than where Oegacy providers are able,” Bodewes explained. The company works together with large companies that handle insurance, legal and automotive customer interactions.

The Hardware Edge: Why Make Groq’s Chips Subsecond AI possible

Groq’s specialized AI -Inservence chips, called Language processing units (LPUs), take care of the hardware foundation that makes the multimodel approach viable. In contrast to general graphic processors that are usually used for AI insertion, LPUs optimize specifically for the sequential nature of language processing.

“The LPU architecture is optimized for accurate control of data movement and calculation at a fine-grained level at high speed and predictability, making the efficient management of several small ‘delta’ weights (the Loras) on a common basic model without extra latency,” Kantor said.

The cloud-based infrastructure also deals with scalability problems that have limited the AI deployment historically. “The great thing about using a cloud-based solution such as Groqcloud is that GROQ orchestration and dynamic scale process for our customers for each AI model that we offer, including refined Lora models,” Kantor explained.

The economic benefits seem to be considerable for companies. “The simplicity and efficiency of our system design, a low power consumption and high performance of our hardware, Groq enables to offer customers the lowest costs per token without sacrificing performance while they are scaling,” Kantor said.

AI implementation on the same day: how enterprises skip months of integration

One of the most compelling aspects of the partnership is the implementation rate. In contrast to traditional AI implementations that may require months of integration work, the approach of Maitai makes transitions possible for companies that already use general models on the same day.

“For companies that are already in production using models for general purposes, we usually transfer them to Maitai on the same day, without disturbance,” said Dalsanto. “We start with immediate data collection and within a few days to a week we can deliver a refined model that is faster and more reliable than their original arrangement.”

This rapid implementation capacity deals with a common concern about AI projects: long -term implementation period lines that postpone the return on investments. The Proxy Layer approach means that companies can retain their existing API integrations and at the same time gain access to continuous improvement of performance.

The Future of Enterprise AI: Specialized models replaced One-Size-Fits-All

The collaboration indicates a broader shift in Enterprise AI architecture, which goes away from monolithic, general models to specialized, task-specific systems. “We observe the growing demand of teams that break through their applications in smaller, highly specialized workloads, each benefit from individual adapters,” said Dalsanto.

This trend reflects the adult insight into AI implementation -challenges. Instead of expected that individual models excel in all tasks, companies are increasingly recognizing the value of specially built solutions that can be continuously refined on the basis of performance data in practice.

“Multi-Lora-Hotswapping can implement companies faster, more accurate models that are precisely adapted for their applications, whereby traditional cost and complexity barriers are removed,” Dalsanto explained. “This fundamentally shifts how Enterprise AI is built and used.”

The technical foundation also makes more advanced applications as the technology matures. The infrastructure of GROQ can support dozens of specialized models on one case, so that companies may be able to make very adapted AI experiences in various customer segments or use cases.

“Multi-lora hotwapping makes low latency, very accurate inference possible tailored to specific tasks,” said Dalsanto. “Our route map gives priority to further investments in infrastructure, tools and optimization to determine appropriate, application -specific inference as the new standard.”

For the wider conversation -AI -market, the partnership shows that technical limitations that are once considered insurmountable can be tackled by specialized infrastructure and careful system design. As more companies implement AI telephone agents, the competitive benefits demonstrated by Fonely can determine new basic expectations for performance and responsiveness in automated customer interactions.

The success also validates the emerging model of AI infrastructure companies that work together to resolve complex implementation challenges. This cooperation approach can speed up innovation in the AI sector of the Enterprise AI, since combining specialized possibilities to deliver solutions that are higher than what a single provider could achieve independent. If this partnership is an indication, the era of clearly artificial telephone calls can end faster than someone expected.

Source link