AI

Patronus AI lands $50M to build ‘digital worlds’ that stress-test AI agents

AI agents are becoming increasingly sophisticated. They evolve from answering questions to autonomously performing complex multi-step tasks.

But before these agents can be trusted to book travel or perform financial analysis on behalf of users, model providers and the startups that build such agents want to ensure they perform reliably in a wide range of scenarios.

AI labs often use benchmarks to show off their model’s prowess, but a high score, even on an agent-oriented benchmark, doesn’t really prove that an AI can correctly perform various complex real-world tasks.

Patronus AIa startup founded in 2023 by former Meta AI researchers Anand Kannappan and Rebecca Qian, helps modelers and companies refine models to do just that by building simulated digital environments in which agent performance can be evaluated.

The San Francisco-based startup has an important problem to solve. Virtually every groundbreaking AI lab and many emerging startups are now customers, according to Glenn Solomon, managing director at Notable Capital, who describes demand for the company’s simulated environments as virtually insatiable.

Patronus’s revenue has increased tenfold in the past year, fueling investor interest. On Thursday, the company announced a $50 million Series B round led by Greenfield Partners, with participation from Notable Capital, Lightspeed, Datadog and Samsung. The round brings the company’s total funding to $70 million.

Patronus uses what it calls “digital world models” to create replicas of websites and internal systems. In these environments, agents are stress tested after training using reinforcement learning, which iteratively rewards successful task completion and punishes errors.

AI labs see great value in these digital simulations because they give agents the chance to try out different, sometimes unpredictable, scenarios. The company compares its approach to how Waymo trained autonomous cars by first building synthetic worlds to test vehicles against rare hazards, such as severe weather or a child chasing a ball.

See also  From AI agent hype to practicality: Why enterprises must consider fit over flash

The difference with AI agents is that they tend to take shortcuts, meaning they do not complete the task correctly. “Patronus is very good at detecting the hacks and making sure they hold the models accountable,” Solomon said.

Patronus is currently delivering its simulated digital worlds for software engineering and finance, but this is just the beginning, according to Kannappan.

“Today we are very focused on the issues that are verifiable, so the issues that you can immediately check and verify, but there are many more areas that are very unverifiable or very difficult to verify,” he said.

Just because these processes are verifiable does not mean they are simple. “We want to actually be able to create the environment where you can run an agent who can work 10 hours, 10 days or 10 weeks,” Kannappan said.

As for rivals, Patronus believes it is mainly competing against the internal teams that have already built AI labs to evaluate agent behavior. While human data companies like Mercor and Surge help modelers with reinforcement learning, Patronus takes a different approach by evaluating how agents behave without any human intervention.

When you make a purchase through links in our articles, we may earn a small commission. This does not affect our editorial independence.

Source link

Back to top button