Inside Ring-1T: Ant engineers solve reinforcement learning bottlenecks at trillion scale


Chinas Ant groupa subsidiary of Alibaba, detailed technical information about its new model, Ring-1Twhich the company claims is “the first open-source reasoning model with a total of one trillion parameters.”
Ring-1T wants to compete with other reasoning models such as GPT-5 and the o-series OpenAIas well as Googling‘s Gemini 2.5. With the new release of the latest model, Ant expands the geopolitical debate about who will dominate the AI race: China or USA.
Ant Group said Ring-1T is optimized for math and logic problems, code generation and scientific problem solving.
“With approximately 50 billion enabled parameters per token, Ring-1T achieves state-of-the-art performance in multiple challenging benchmarks – despite relying solely on natural language reasoning capabilities,” Ant said in a paper.
Ring-1T, which was first released in preview in September, uses the same architecture as Ling 2.0 and is trained on the base Ling-1T model that the company released earlier this month. Ant said this will allow the model to support up to 128,000 tokens.
To train a model as large as Ring-1T, researchers had to develop new methods to scale reinforcement learning (RL).
New training methods
Ant Group has developed three “interconnected innovations” to support Ring-1T’s RL and training, a challenge given the size of the model and the typically large computing requirements it entails. These three are IcePop, C3PO++ and ASystem.
IcePop removes noisy gradient updates to stabilize training without slowing down inference. It helps eliminate catastrophic misalignment of training inferences in RL. The researchers noted that when training models, especially those that use a mix-of-experts (MoE) architecture such as Ring-1T, there can often be a discrepancy in the probability calculations.
“This problem is especially pronounced when training MoE models with RL due to the inherent use of the dynamic routing mechanism. Furthermore, in long CoT environments, these discrepancies can gradually accumulate across iterations and be further amplified,” the researchers said.
IcePop “suppresses unstable training updates through double-sided masking calibration.”
The next new method that the researchers had to develop is C3PO++, an improved version of the C3PO system that Ant previously brought to the market. The method manages how Ring-1T and other oversized parameter models generate and process training samples, or what they call rollouts, so GPUs don’t sit idle.
The way it works would split the work on deployments into chunks that can be processed in parallel. One group is the inference pool, which generates new data, and the other is the training pool, which collects results to update the model. C3PO++ creates a token budget to control how much data is processed so that GPUs are used efficiently.
The latest new method, ASystem, uses a SingleController+SPMD (Single Program, Multiple Data) architecture to enable asynchronous operations.
Benchmark results
Ant pointed Ring-1T to benchmarks that measure performance in math, coding, logical reasoning and general tasks. They tested it with models like DeepSeek-V3.1-Terminus-Thinking, Qwen-35B-A22B-Thinking-2507, Gemini 2.5 Pro and GPT-5 Thinking.
In benchmark testing, Ring-1T performed strongly, coming in second to OpenAI’s GPT-5 in most benchmarks. Ant said Ring-1T showed the best performance of all open weight models tested.
The model achieved a score of 93.4% on the AIME 25 rankings, second only to GPT-5. In terms of coding, Ring-1T outperformed both DeepSeek and Qwen.
“It indicates that our carefully synthesized data set shapes Ring-1T’s robust performance in programming applications, providing a strong foundation for future efforts in agentic applications,” the company said.
Ring-1T shows how much Chinese companies invest in models
Ring-1T is just the latest model from China that aims to dethrone GPT-5 and Gemini.
Chinese companies have been rapidly releasing impressive models since DeepSeek’s surprise launch in January. Ant’s parent company, Alibaba.comrecently released Qwen3 Omnia multimodal model that naturally unites text, image, audio and video. DeepSeek has also continued to improve its models and announced earlier this month has launched DeepSeek-OCR. This new model reinterprets the way models process information.
As Ring-1T and Ant develop new methods to train and scale extra-large models, the battle for AI dominance between the US and China continues to heat up.




