Allen AI’s Tülu 3 Just Became DeepSeek’s Unexpected Rival

February 2, 2025

0 6 minutes read

The headlines keep coming. The Deepseek models are challenging benchmarks, determining new standards and making a lot of noise. But something interesting has happened in the AI research scene that is also worth your attention.

All ai their new released quietly Tülu 3 Family of models, and their 405b parameter version not only competes with DeepSek – It is matching or beats it on important benchmarks.

Let’s put this in perspective.

The 405B Tülu 3 -model is going to be against top performers such as Deepseek V3 About different tasks. We see similar or superior performance in areas such as math problems, coding challenges and precise instructions that follow. And they also do it with a fully open approach.

They have released the complete training sink, the code and even their new learning method for reinforcement reinforcement, learning reinforcement with verifiable rewards (RLVR) that have made this possible.

Developments such as these in recent weeks really change how top-tier AI development takes place. When a fully open source model can correspond to the best closed models there are, it opens the possibilities that were previously locked behind private business walls.

The technical battle

What made Tülu 3 striking? It comes down to a unique four-phase training process that goes beyond traditional approaches.

Let’s see how all AI has built this model:

Phase 1: Strategic data selection

The team knew that the model quality starts with data quality. They combined established data sets such as Random And Open assistant with customized content. But here is the most important insight: they have not only aggregated data – they have made targeted data sets for specific skills such as mathematical reasoning and coding skills.

Phase 2: Build better reactions

In the second phase, all AI focused on teaching their model -specific skills. They created different sets of training data – some for mathematics, others for coding and more for general tasks. By repeatedly testing these combinations, they could see exactly where the model excelled and where the work needed. This iterative process revealed the true potential of what Tülu 3 could achieve in every area.

Phase 3: Learning from comparisons

This is where all AI became creative. They built a system that could immediately compare Tülu 3 with other top models. But they have also solved a persistent problem in AI – the tendency of models to write long reactions only because of the length. Their approach, use Length-normalized Direct Preference Optimization (DPO)meant that the model learned to appreciate quality over quantity. The result? Responses that are both accurate and targeted.

When AI models learn from preferences (which answer is better, A or B?), She tends to develop a frustrating bias: they start to think that longer answers are always better. It is as if they are trying to win by saying more instead of saying things well.

Length-normalized DPO fixes this by adjusting how the model learns from preferences. Instead of just looking at what reaction was preferred, it takes into account the length of each reaction. Think of it as assessing answers based on their quality per word, not just their total impact.

Why does this matter? Because Tülu 3 helps to learn exactly and efficiently. Instead of reactions with extra words to appear more extensive, it learns to deliver value in every length that is actually needed.

This may seem like a small detail, but it is crucial for building AI that communicates naturally. The best human experts know when they should be concise and when they have to work out and that is exactly which length-ever-normalized DPO helps to teach the model.

Phase 4: The RLVR innovation

This is the technical breakthrough that deserves attention. RLVR replaces subjective remuneration models with concrete verification.

Most AI models learn through a complex system of reward models – essentially trained guesses about what makes a good reaction. But all AI took a different path with RLVR.

Think about how we currently train AI models. We usually need other AI models (called reward models) to assess whether a response is good or not. It is subjective, complex and often inconsistent. Some reactions may seem good, but contain subtle errors that glide through.

RLVR turns this approach on his head. Instead of trusting subjective judgments, the concrete, verifiable results uses. When the model tries a math problem, there is no gray area – the answer is right or wrong. When the code writes, that code is performed correctly or not.

Here it becomes interesting:

The model receives immediate, binary feedback: 10 points for correct answers, 0 for incorrect
There is no room for partial credit or fuzzy evaluation
Learning is focused and accurate
The model learns to give priority to accuracy above plausible sounding but incorrect reactions

RLVR Training (all ai)

The results? Tülu 3 showed significant improvements in tasks where the accuracy is the most important. The performance on mathematical reasoning (GSM8K -Benchmark) and coding challenges have risen in particular. Even his instruction followed became more precise because the model learned to appreciate the concrete accuracy over estimated reactions.

What makes this particularly exciting is how the game for open-source AI changes. Previous approaches often had difficulty matching the precision of closed models at technical tasks. RLVR shows that with the right training approach, open-source models can achieve the same level of reliability.

A look at the numbers

The 405b parameter version of Tülu 3 competes directly with top models in the field. Let’s investigate where it excels and what this means for Open Source AI.

Mathematics

Tülu 3 excels in complex mathematical reasoning. On benchmarks such as GSM8K and Math, it corresponds to the performance of Deepseek. The model deals with Meerstaps problems and shows strong mathematical reasoning options.

Code

The coding results are equally impressive. Thanks to RLVR training, Tülu writes 3 code that effectively solves problems. Its strength lies in understanding coding instructions and producing functional solutions.

Follow precise instruction

The ability of the model to follow instructions stands out as a core strength. Although many models approach or generalize instructions, Tülu shows 3 remarkable precision when performing exactly what is being asked.

Opening the black box with AI development

Allen AI has released both a powerful model and their entire development process.

Every aspect of the training process is documented and accessible. From the four-phase approach of data preparation methods and RLVR implementation Ligt open the entire process for study and replication. This transparency sets a new standard with high-performance AI development.

Developers receive extensive resources:

Complete training pipelines
Data processing tools
Evaluation frames
Implementation specifications

This enables teams to:

Change training processes
Adjust methods for specific needs
Build on proven approaches
Create specialized implementations

This open approach speeds up innovation in the field. Researchers can build on verified methods, while developers can concentrate on improvements instead of starting zero.

The rise of Open Source Excellency

The success of Tülu 3 is a big moment for Open AI development. When open source models match or exceeds private alternatives, this fundamentally changes the industry. Research teams worldwide have access to proven methods, which accelerates their work and new innovations are deported. Private AI Laboratories must adapt – either by increasing transparency or pushing technical boundaries even further.

Looking ahead Tülu 3s broke through in verifiable rewards and more phases training on what is coming. Teams can build on these foundations, so that the performance may be even higher. The code exists, the methods have been documented and a new wave of AI development has begun. For developers and researchers, the possibility is to experiment and improve these methods, the start of an exciting chapter in AI development.

Frequently asked questions (FAQ) about Tülu 3

What is Tülu 3 and what are the most important characteristics?

Tülu 3 is a family of Open-Source LLMS developed by Allen AI, built on Lama 3.1 architecture. It is supplied in different sizes (parameters of 8B, 70B and 405B). Tülu 3 is designed for improved performance in various tasks, including knowledge, reasoning, mathematics, coding, instruction and safety.

What is the training process for Tülu 3 and what data is used?

The training of Tülu 3 includes various important phases. Firstly, the team manages a varied series of instructions from both public data sets and synthetic data that is aimed at specific skills, which disinfines the data against benchmarks. Secondly, SuperVoed Finetuning (SFT) is performed on a mix of instructions, mathematics and coding data. Subsequently, direct preferred optimization (DPO) is used with preferred data generated by means of human and LLM feedback. Finally, reinforcement learning with verifiable rewards (RLVR) is used for tasks with measurable correctness. Tülu 3 uses composite datasets for each phase, including persona -driven instructions, mathematics and code data.

How do the safety of Tülu 3 approach and what statistics are used to evaluate this?

Safety is a core component of the development of Tülu 3, tackled during the training process. A safety -specific data set is used during SFT, which has been largely orthogonally for other task -oriented data.

What is RLVR?

RLVR is a technique where the model is trained to optimize against a verifiable reward, such as the accuracy of an answer. This differs from traditional RLHF that uses a reward model.

Source link

Allen AI’s Tülu 3 Just Became DeepSeek’s Unexpected Rival