HOLY SMOKES! A new, 200% faster DeepSeek R1-0528 variant appears from German lab TNG Technology Consulting GmbH

Do you want smarter insights into your inbox? Register for our weekly newsletters to get only what is important for Enterprise AI, data and security leaders. Subscribe now
It is a little more than a month since the Chinese AI Startup Deepseek, a spur of Hong Kong, based High-Flyer Kapital Management, the latest version of his HIT Open Source Model Deepseek, R1-0528.
Just like its predecessor, Deepseek-R1-Die the AI and Global Business Communities fluctuated with how cheap it was trained and how well it performed with reasoning tasks, all available for free for developers and companies-R1-0528 is already being adapted and remixed by other AI-Labs and developers of the Great part of the Punterns Licence.
This week, the 24-year-old German company TNG Technology Consulting GmbH has released one Such adjustment: Deepseek-TNG R1T2 ChimeraThe newest model in the Chimera Large Language Model (LLM) family. R1T2 delivers a remarkable boost in efficiency and speed, score on more than more than more than more than more 90% of the Intelligence Benchmark scores of R1-0528While generating answers with Less than 40% of the number of output token from R1-0528.
That means that it produces shorter reactions, which immediately translates into faster conclusion and lower calculation costs. On the model map that TNG has released for its new R1T2 on the AI Code Sharing Community Hugging Face, the company states that it is “about 20% faster than the regular R1” (the one released in January) “and more than twice as fast as R1-0528” (the official update of Deepseek).
The reaction is already incredibly positive from the AI developer community. “Damn! Deepseek R1T2-200% faster than R1-0528 & 20% faster than R1,” wrote Vaibhav (VB) Srivastav, a senior leader in a hugging face, on X. “Significantly better than R1 on GPQA & Aime 24, made through assembly from experts with DS V3, R1 & R1-0528 and it is mit-figure, available on a hugging face.”
This profit is made possible by the method of TNG’s Assembly-of-Experts (AOE)-a technique for building LLMS by selectively merge the weight sensors (internal parameters) from several pre-trained models that TNG has described in a Paper published in May On Arxiv, the Non-Peer Online Journal Open Access Online Journal assessed.
A successor to the original R1T-Chimera, R1T2 introduces a new “Tri-Mind” configuration that integrates three parent models: Deepseek-R1-0528, Deepseek-R1 and Deepseek-V3-0324. The result is a model that is designed to maintain high reasoning capacity and at the same time considerably reduce the inference costs.
R1T2 is built without further coordination or retraining. It inherits the reasoning strength of R1-0528, the structured thinking patterns of R1 and the concise, instruction-oriented behavior of V3-0324-it delivering a more efficient, but capable model for business and research use.
How assembly of experts (AOE) differs from mixture of experts (tired)
Mixture of experts (tired) is an architectural design in which various components, or “experts”, are conditionally activated per input. In Moe LLMS such as Deepseek-V3 or Mixtral, only a subset of the expert layers of the model (eg 8 of the 256) are active during the forward pass of a certain token. As a result, very large models can achieve higher parameters and specialization, while the inference costs remain manageable – because only a fraction of the network is evaluated per token.
Assembly of experts (AOE) is a model merge technology, not architecture. It is used to make a new model of several pre-trained MOE models by selectively interpolating their weight pensors.
The “experts” in AOE refer to the merged model components – usually the routed expert – tents in tired layers – not on experts who are dynamically activated during Runtime.
TNG’s implementation of AOE focuses primarily on the merging of routed expert-tertoors-the part of a model that is most responsible for specialized reasoning while it often retains the more efficient shared and attention of faster models such as V3-0324. With this approach, the resulting Chimera models can inherit the reasoning strength without replication of the surprises or latency of the strongest parent models.
Performance and speed: what the benchmarks actually show
According to Benchmark -comparisons presented by TNG, R1T2 reaches between 90% and 92% of the reasoning performance of the most intelligent parent, Deepseek-R1-0528, as measured by AIJN-24, AIME-25 and GPQA-Diamond test sets.

In contrast to Deepseek-R1-0528, this tends to produce long, detailed answers because of the extensive chain of thought reasoning is R1T2 designed to be much more concise. It delivers intelligent answers in a similar way while using considerably fewer words.
Instead of concentrating on rough processing time or tokens-per-second, measures TNG “speed” in terms of Implementing Count per answer – A practical proxy for both costs and latency. According to Benchmarks Shared by TNG, R1T2 generates answers with the help of About 40% of the tokens Required by R1-0528.
That translates to one 60% reduction of the starting lengthWho immediately reduces the inference time and calculates taxes, which accelerates reactions by 2x or 200%.
In comparison with the original Deepseek-R1, R1T2 is also nearby On average 20% more conciseOffering meaningful profit in efficiency for high-throughput or cost-sensitive implementations.
This efficiency is not at the expense of intelligence. As shown in the Benchmark card presented in the technical article of TNG, R1T2 is in a desired zone on the information versus output costs curve. It retains the quality of reasoning and minimizes the surprises – an outcome of crucial importance for Enterprise applications where inference speed, transit and all matter costs.
Considerations and availability of implementation
R1T2 is released under a permissive MIT license and is now available on a cuddle face, which means that it is open source and is available to be used and built into commercial applications.
TNG notes that although the model is well suited for general reasoning tasks, it is currently not recommended for use cases that require function calls or tool use, due to restrictions inherited from the DeepSeek-R1 line. These can be tackled in future updates.
The company also advises European users to assess compliance with the EU AI Act, which will take effect on 2 August 2025.
Companies that work in the EU must revise or consider relevant provisions to stop the model use after that date if the requirements cannot be met.
However, American companies that are active in their own country and maintain American users or those of other countries are, however not Subject to the Terms and Conditions of the EU AI Act, which should give them considerable flexibility when using and implementing this free, fast open source reasoning model. If they serve users in the EU, some The provisions of the EU Act still apply.
TNG has already made previous Chimera variants available through platforms such as OpenRouter and Chutes, where they reportedly have processed billions of tokens every day. The release of R1T2 represents a further evolution in these efforts for public availability.
About TNG Technology Consulting GMBH
Founded in January 2001, TNG Technology Consulting GmbH Is based in Bavaria, Germany, and employs more than 900 people, with a high concentration of PhD students and technical specialists.
The company focuses on software development, artificial intelligence and devops/cloud services, where large enterprise customers are served in various industries, such as telecommunications, insurance, automotive, e-commerce and logistics.
TNG works as a values based on values. The unique structure, based on operational research and self -management principles, supports a culture of technical innovation.
It actively contributes to open-source communities and research, as demonstrated by public releases such as R1T2 and the publication of the methodology of the assembly of experts.
What it means for technical decision makers of companies
For CTOs, AI platform owners, engineering leads and IT -purchasing teams, R1T2 introduces tangible benefits and strategic options:
- Lower conclusion: With less output tokens per task, R1T2 reduces the GPU time and energy consumption, which translates directly into infrastructure savings-especially important in high-throughput or real-time environments.
- High reasoning quality without overhead: It keeps a lot of the reasoning power of top models such as R1-0528, but without their long-winded ones. This is ideal for structured tasks (mathematics, programming, logic) where concise answers are preferred.
- Open and adaptable: The MIT license makes full implementation control and adjustment possible, making private hosting, model lines or further training within regulated or Air-Happed environments possible.
- Emerging modularity: The AOE approach suggests a future in which models are built modularly, allowing companies to compile specialized variants by recombizing strengths of existing models, rather than completely retraining.
- Caveat: Enterprises that depend on functions, tool use or advanced agent orchestration, must record current limitations, although future Chimera updates can tackle these gaps.
TNG encourages researchers, developers and enterprise users to explore the model, test the behavior and give feedback. The R1T2 Chimera is available on hugelface.co/tngtech/deepseek-tng-r1T2-Chimeraand technical questions can be targeted research@tngtech.com.
For technical background and benchmark methodology, the TNG research paper is available on Arxiv: 2506.14794.
Source link




