One of Google’s recent Gemini AI models scores worse on safety

May 3, 2025

0 2 minutes read

A recently released Google AI model scores worse on certain safety tests than its predecessor, according to the company’s internal benchmarking.

In one technical report Google was published this week and reveals that the Gemini 2.5 flash model generates a text that violates its safety guidelines than Gemini 2.0 Flash. On two statistics, “Text-to-text safety” and “Image-totext Safety”, Gemini 2.5 flash regulates 4.1% and 9.6% respectively.

Safety to text-to-text measures how often a model violates the guidelines from Google that receive a prompt, while image-to-text safety evaluates how closely the model adheres to these limits when it is asked to use an image. Both tests are automated, not guided by humans.

In an e-mail statement, a Google spokesperson confirmed that Gemini 2.5 Flash “performs worse on text-to text and image-to-text safety.”

These surprising benchmark results come when AI companies move to make their models more tolerant – in other words, less chance of refusing to respond to controversial or sensitive topics. For the newest harvest of Llama models, Meta said that it coordinated the models not to endorse “some views on others” and to answer more “discussion” political instructions. Earlier this year, OpenAi said that future models would adjust not to adopt editorial attitude and offer multiple perspectives on controversial topics.

Sometimes those toll missing efforts are accompanying again. WAN reported on Monday that the standard model that Opai’s Chatgpt minors enabled to generate erotic conversations. OpenAi blamed the behavior of a ‘bug’.

According to Google’s technical report, Gemini 2.5 Flash, which is still in preview, follows more married than Gemini 2.0 Flash, including instructions that exceed problematic lines. The company claims that the regressions can be partially attributed to false positives, but it also admits that Gemini 2.5 flash sometimes generates “violent content” when it is explicitly requested.

WAN event

Berkeley, Ca
|
June 5

Book now

“Of course there is tension in between [instruction following] On sensitive topics and safety policy violations, which are reflected in our evaluations, “is the report.

Speechmap scores, a benchmark investigating how models respond to sensitive and controversial instructions, also suggest that Gemini 2.5 flash is much less likely to answer controversial questions than Gemini 2.0 Flash. WAN’s testing of the model via AI platform OpenRouter discovered that it will not write approval essays to support the replacement of human judges by AI, weakening the correct process protection in the US and implementing widespread guarantoring surveillance programs for the government.

Thomas Woodside, co-founder of the Secure AI project, said that the limited details that Google has given in his technical report show the need for more transparency when testing models.

“There is a trade -off between following instructions and policy, because some users can ask for content that the policy would violate,” Woodside told WAN. “In this case, the newest flash model from Google meets more in instructions, while it also violates more policy. Google does not provide many details about the specific cases in which the policy is violated, although they say they are not serious. Easily knowing, it is difficult for independent analysts to know if there is a problem.”

Google has previously come under fire for its model safety report practices.

It took the company weeks to publish a technical report for the most capable model, Gemini 2.5 Pro. When the report was eventually published, it initially omitted important details of the safety tests.

On Monday, Google issued a more detailed report with additional safety information.

Source link