OpenAI’s latest AI models have a new safeguard to prevent biorisks

April 17, 2025

1 2 minutes read

OpenAI says it has used a new system to check its latest AI-reasoning models, O3 and O4-Mini, for instructions with regard to organic and chemical threats. The system is intended to prevent the models from giving advice that someone can instruct when performing potentially harmful attacks, According to the OpenAI safety report.

O3 and O4-Mini represent a meaningful increase in capacity compared to the earlier models of OpenAI, says the company, and thus form new risks in the hands of bad actors. According to the internal benchmarks of OpenAi, O3 is more competent in answering questions about creating certain types of organic threats in particular. For these reason and to reduce other risks, OpenAI has created the new monitoring system that the company describes as a ‘on a safety-oriented reasoning monitor’.

The monitor, tailor-made to reason about the contents policy of OpenAi, runs on top of O3 and O4-Mini. It is designed to identify instructions with regard to biological and chemical risk and to instruct the models to refuse to give advice on those topics.

To establish a basic line, OpenAi red teamers spent around 1000 hours on marking “unsafe” Biorisk-related conversations from O3 and O4-Mini. During a test in which OpenAI simulated the “blocking logic” of his safety monitor, according to OpenAi, the models refused to respond to risky prompts 98.7% of the time.

OpenAI acknowledges that the test does not take into account people who may try new instructions after they have been blocked by the monitor, so the company says that it will partly continue to rely on human monitoring.

According to the company, O3 and O4-Mini do not cross the “high risk” threshold of OpenAI for biorisks. In comparison with O1 and GPT-4, however, OpenAi says that early versions of O3 and O4-Mini proved more useful to answer questions about the development of organic weapons.

Graph of the system map of O3 and O4-Mini (screenshot: openi)

The company actively follows how its models can make it easier for malicious users to develop chemical and biological threats, according to the recently updated OpenAI Ready -made framework.

OpenAi increasingly trusts automated systems to reduce the risks of its models. For example to prevent GPT-4O’s Native image generator of creating sexual abuse material (CSAM) of children (CSAM)OpenAi says it uses a reasoning monitor that is comparable to that of the company used for O3 and O4-Mini.

Yet various researchers have expressed their concern that OpenAi does not prioritize safety as it should be. One of the company’s red team partners, Metr, said it had relatively little time to test O3 on a benchmark for deceptive behavior. In the meantime, OpenAi decided not to issue a safety report for his GPT-4.1 model, which was launched earlier this week.

Source link