OpenAI launches program to design new ‘domain-specific’ AI benchmarks

April 9, 2025

0 1 minute read

Like many AI Laboratories, AI -Benchmarks are broken, just like many AI Laboratories. It says that it wants to repair them through a new program.

The OpenAI Pioneers program referred to, the program will focus on making evaluations for AI models that “set the bar for how good looks”, as opened in one Blog post.

“As the pace of the AI adoption accelerates in the industry, there is a need to understand and improve the impact on the world,” the company continued in its position. “Creating domain-specific evals is a way to better reflect Real-World use cases, allowing teams to assess model performance in practical, high-stakes environments.”

As the recent controversy with the Crowdsourced Benchmark LM Arena and the Maverick model of Meta illustrate, it is difficult to know, nowadays, exactly what distinguishes the one model from the other. Many commonly used AI benchmarks measure the performance on esoteric tasks, such as solving math problems at doctoral level. Others can be gamed or not properly match the preferences of most people.

Through the Pioneers program, OpenAi hopes to make benchmarks for specific domains such as legal, finance, insurance, health care and accounting. The lab says it will work with “multiple companies” in the coming months to design tailor -made benchmarks and ultimately share those benchmarks publicly, together with “branch -specific” evaluations.

“The first cohort will focus on startups that will help lay the foundation of the OpenAI Pioneers program,” Openai wrote in the blog post. “We select a handful of startups for this first cohort, each of which works on high -quality, applied use cases where AI can stimulate the impact of the real world.”

Companies in the program also have the opportunity to collaborate with the OpenAI team to create model improvements through strengthening reinforcement, a technique that optimizes models for a narrow set of tasks, says OpenAi.

The big question is whether the AI community will embrace Benchmarks whose creation was funded by OpenAi. OpenAi has previously supported benchmarking efforts and designed its own evaluations. But working with customers to release AI tests can be seen too far as an ethical bridge.

Source link