Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks


The Allen Institute for AI (Ai2) recently released what it calls its most powerful model family to date, Olmo 3. But the company kept iterating on the models and expanding the RL (reinforcement learning) runs to create Olmo 3.1.
The new Olmo 3.1 models focus on efficiency, transparency and control for enterprises.
Ai2 has updated two of the three versions of Olmo 2: Olmo 3.1 Think 32B, the flagship model optimized for advanced research, and Olmo 3.1 Instruct 32B, designed for instruction following, multi-turn dialogue and tool use.
Olmo 3 has a third version, Olmo 3-Base for programming, comprehension and math. It also works well for further fine tuning.
Ai2 said that in order to upgrade Olmo 3 Think 32B to Olmo 3.1, the researchers extended their best RL run with a longer training schedule.
“Following the original launch of Olmo 3, we resumed our RL training run for Olmo 3 32B Think, training for an additional 21 days on 224 GPUs with additional epochs against our Dolci-Think-RL dataset,” Ai2 said in a blog post. “This delivered Olmo 3.1 32B Think, which delivers substantial gains in math, reasoning and following instructions: improvements of 5+ points on AIME, 4+ points on ZebraLogic, 4+ points on IFEval and 20+ points on IFBench, in addition to stronger performance in coding and complex multi-step tasks.”
To achieve Olmo 3.1 Instruct, Ai2 said its researchers applied the recipe behind the smaller Instruct size, 7B, to the larger model.
Olmo 3.1 Instruct 32B is “optimized for chat, tool use and multi-turn dialogue, making it a much more performant brother of Olmo 3 Instruct 7B and ready for real-world applications,” Ai2 said in a message on X.
For now, the new checkpoints are available on the Ai2 Playground or Hugging Face, with API access coming soon.
Better performance on benchmarks
The Olmo 3.1 models performed well in benchmark tests, predictably beating the Olmo 3 models.
Olmo 3.1 Think outperformed the Qwen 3 32B models in the AIME 2025 benchmark and performed close to the Gemma 27B.
Olmo 3.1 Instruct performed strongly against its open source peers, even beating models like Gemma 3 on the Math benchmark.
“As for Olmo 3.1 32B Instruct, it is a larger-scale instruction-tuned model built for chat, tool use, and multi-turn dialogue. Olmo 3.1 32B Instruct is our most capable fully open chat model yet and – in our evaluations – the strongest fully open 32B scale instruct model,” the company said.
Ai2 has also upgraded its RL-Zero 7B models for math and coding. The company said on X that both models benefited from longer and more stable training runs.
Commitment to transparency and open source
Ai2 previously told VentureBeat that it designed the Olmo 3 family of models to provide companies and research labs with greater control and understanding of the data and training incorporated into the model.
Organizations can expand the model’s data mix and retrain it to learn from what has been added as well.
This has long been a commitment for Ai2, which is also a tool called OlmoTrace that tracks how LLM output matches the training data.
“Together, Olmo 3.1 Think 32B and Olmo 3.1 Instruct 32B demonstrate that openness and performance can move forward together. By extending the same model flow, we continue to improve capabilities while maintaining end-to-end transparency over data, code, and training decisions,” Ai2 said.




