Researchers say they’ve discovered a new method of ‘scaling up’ AI, but there’s reason to be skeptical

Have researchers discover a new AI “Scale Act”? That’s what What buzz on social media suggests – but experts are skeptical.
AI scale laws, a bit of an informal concept, describe how the performance of AI models improves as the size of the data sets and computer sources that are used to train, increases. Until about a year ago, scaling up “pre-training” was the training of increasingly larger models on increasingly larger data sets of the dominant law, at least in the sense that most of the Labs border embraced it.
Pre-training has not disappeared, but two extra scale laws, scales after training and the scaling of the test time were created to supplement it. Scaling after the training is essentially the coordination of the behavior of a model, while scaling the test time means that more computing is applied to inference DWZ to stimulate a form of “reasoning” (see: models such as R1).
Researchers from Google and UC Berkeley recently presented in one paper What some commentators have described online as a fourth law: “Inference-time search”.
Searching for inference time has a model that generates many possible answers to a query parallel and then select the “best” of the couple. The researchers claim that it can stimulate the performance of a year old model, such as Google’s Gemini 1.5 Pro, to a level that Opai’s O1 preview “reasoning model” exceeds science and mathematics benchmarks.
Our paper focuses on this search axle and its scales. For example, by simply randomly 200 responses and self-worshiping, Gemini 1.5 (an old model from the beginning of 2024!) Fixed, beats O1 preview and approaches O1. This is without fine-tuning, RL or ground-W Verifiers. pic.twitter.com/HB5FO7IFNH
– Eric Zhao (@ericzhao28) March 17, 2025
‘[B]Y simply tastes 200 responses and self-proof, Gemini 1.5-one old model in early 2024-wastes O1-preview and approaches O1, ”wrote Eric Zhao, a Google Doctorate Fellow and one of the co-authors of the article, in a Series Messages on X. “The magic is that self -verification on a scale will of course become easier! You would expect that choosing a correct solution becomes more difficult as your pool of solutions is, but the opposite is the case!”
However, various experts say that the results are not surprising and that the search for inference time in many scenarios may not be useful.
Matthew Guzdial, an AI researcher and university teacher at the University of Alberta, WAN said that the approach works best when there is a good “evaluation function” – in other words, when the best answer to a question can be easily determined. But most questions are not that cut.
‘[I]f We cannot write a code to define what we want, we cannot use [inference-time] Search, “he said.” We can’t do this for something like general language interaction […] It is generally not a great approach to actually solve most problems. “
Mike Cook, a researcher at King’s College London, specialized in AI, agreed with the assessment of Guzdial and added that it emphasizes the gap between ‘reasoning’ in the AI sentence of the word and our own thinking processes.
‘[Inference-time search] Does not increase the reasoning process of the model, “said Cook.”[I]T is just a way to work on the limitations of a technology that is susceptible to making very self -assured mistakes […] Intuitive If your model makes a mistake 5% of the time, checking 200 attempts to make the same problem that make mistakes easier to recognize. “
Searching for inference time can certainly be undesirable news for an AI industry that wants to scale up the “reasoning” model. Such as the co-authors of the paper note, nowadays reasoning models can pick up Thousands of dollars in computer use About a single math problem.
It seems that the search for new scale techniques will continue.