OpenAI’s research on AI models deliberately lying is wild

Every now and then researchers from the largest technology companies drop a bomb. There was the time that Google said that the newest quantum chip was to the fact that there were several universes. Or when Anthropic gave his AI agent Claudius a snack machine to run and it went amok, the security called people and insisted it was human.
This week it was OpenAi’s turn to lift our collective eyebrows.
OpenAi released some research on Monday that explained How to prevent AI models from saving ‘schedules’. It is a practice in which an “AI behaves on the surface while hiding his true goals”, OpenAI defined in his tweet About the research.
In De Paper, performed with Apollo Research, researchers went a little further and compared AI schedule with a human stock broker who breaks the law to earn as much money as possible. However, the researchers argued that most AI “plans” was not so harmful. “The most common malfunctions include simple forms of deception – for example, pretending they have completed a task without actually doing it,” they wrote.
The paper was usually published to demonstrate that “deliberative coordination” “-the anti-Schem technique that they were testing.
But it also explained that AI developers have not discovered a way to train their models not to separate schedules. That is because such a training could actually learn the model how they can even better schemes to prevent them from being detected.
“An important failure way to try to ‘train’ schemes is simply the model to learn to more carefully and more secretly schedules,” the researchers wrote.
WAN event
San Francisco
|
27-29 October 2025
Perhaps the most amazing part is that if a model understands that it is being tested, it can pretend that it is not to endure the test, even if it is still capable of. “Models are often more aware that they are being evaluated. This situational consciousness can reduce the schedule itself, regardless of real coordination,” the researchers wrote.
It is not news that AI models will lie. In the meantime, most of us have experienced AI Hallucinations, or the model with confidence gives an answer to a prompt that is simply not true. But hallucinations actually present guesswork with confidence, while OpenAi Research released Earlier this month documented.
Schedule is something else. It’s intentional.
Even this revelation – that a model will deliberately mislead people – is not new. Apollo research first published a paper in December Documering how five models were shocking when they received instructions to achieve a goal “at all costs”.
The news here is actually good news: the researchers saw a significant reduction in the schedule by using “deliberative coordination” “. This technique includes the learning of the model an “anti-Schem specification” and then having the model be assessed before it acts. It’s a bit like having small children repeat the rules before they can play.
OpenAi researchers insist that lying they caught with their own models, or even with chatgpt, is not so serious. Such as the co-founder of OpenAi Wojciech Zaremba told Maxwell Zeff of WAN about this research: “This work was done in the simulated environments, and we think that it represents future use cases. Nowadays we have not seen this kind of consequence in our production traffic.” Function. ” And that’s just the lie. There are some small forms of deception that we still have to tackle. “
The fact that AI models of several players deliberately cheat people is perhaps understandable. They were built by people to simulate people and (apart from synthetic data) for the most part trained on data produced by people.
They are also crazy.
Although we have all experienced the frustration of poorly performing technology (thinking of you, home printers of the past), when was the last time your non-AI software deliberately lied to you? Has your inbox ever manufactured e -mails? Have your CMS registered new prospects that did not exist to fill its numbers? Has your fintech app invented its own bank transactions?
It is worth thinking about this if the vessels of the business world in the direction of an AI -future future, where companies believe that agents can be treated as independent employees. The researchers of this article have the same warning.
“As more complex tasks are allocated with real consequences and starting with the pursuit of more ambiguous, long-term goals, we expect that the potential for harmful diagrams will grow our guarantees and our ability to test rigorously must grow accordingly,” they wrote.




