Weibo's new open source AI model VibeThinker-1.5B outperforms DeepSeek-R1 on $7,800 post-training budget

November 13, 2025

10 5 minutes read

Another day at the end of 2025, another impressive result from a Chinese company in open source artificial intelligence.

Chinese social networking company Weibo’s AI division recently released its open source VibeThinker-1.5B– a large language model (LLM) with 1.5 billion parameters, a refined version of a rival Chinese technology company Alibaba’s Qwen2.5-Math-1.5B.

It is now free to download and use by researchers and business developers, even for commercial purposes, under a permitted MIT license at Hugging face, GitHub And Model rangeimmediately technical report on open access scientific publication site arxiv.org.

And yet, despite its compact size, VibeThinker-1.5B achieves excellent reasoning performance in mathematics and coding, outperforming models hundreds of times its size, and even outperforming Chinese rival DeepSeek’s famous R1, which went viral early this year (a 671 billion parameter model) in formal reasoning.

It further eclipses Mistral AI’s Magistral Medium and holds its own against Anthropic’s Claude Opus 4 and OpenAI’s gpt-oss-20B Medium, all while requiring a fraction of the infrastructure and investment.

It also does this after being trained on a budget of just $7800 USD for computing resources (3900 GPU hours on Nvidia H800s) – far less than the tens or even hundreds of thousands of dollars typically required to refine models of similar or larger scale.

However, keep in mind that this is not the total cost of model development: LLMs are trained in phases. First comes pre-training, where the model learns basic language structure and general knowledge by predicting the next word in massive amounts of text from the Internet, books and articles. This gives it fluency, but not much sense of following instructions or carrying on a conversation

Then comes post-training, which uses much smaller, higher quality data sets (typically collections of sample questions, prompts, and expert-written answers) to teach the model how to respond helpfully, understand problems, and adapt to human expectations. Still, the cost-effectiveness of Weibo after training on VibeThinker-1.5B is remarkable and should be commended.

The open source release sets aside assumptions about parameter scale, computational intensity, and the minimum achievable size for high-performance LLMs.

Another training approach: spectrum-to-signal

VibeThinker-1.5B owes its performance not to scale, but to the training framework behind it: the Spectrum-to-Signal Principle (SSP).

Instead of optimizing a model purely for the correctness of one answer (Pass@1), the SSP framework decouples supervision fine-tuning (SFT) and reinforcement learning (RL) into two different phases with different goals:

SFT (“Spectrum Phase”): The model is trained to maximize the diversity of possible correct answers, improving the Pass@K score. This creates a wide range of plausible solution paths.
RL (“Signal Phase”): A second-stage reinforcement learning system (called MaxEnt-Guided Policy Optimization, or MGPO) is used to identify and reinforce the most correct paths from this diverse solution pool. MGPO prioritizes problems where the model is most uncertain, using entropy-based weighting to focus learning.

The authors argue that this separation allows small models to explore the reasoning space more effectively, achieving signal amplification without relying on huge numbers of parameters.

VibeThinker-1.5B makes a compelling case that the industry’s reliance on parameter scaling as the only route to better reasoning performance may be outdated.

Using a diversity-oriented training pipeline, WeiboAI has shown that smaller, more accessible models can rival and even outperform billion-dollar systems on logic-heavy tasks.

The low resource footprint is one of the most important aspects of VibeThinker-1.5B. At less than $8,000, post-training costs are 30-60x lower than models like DeepSeek R1 and MiniMax-M1, which cost between $294,000 and $535,000 to train.

Performance across domains

Despite its small size, VibeThinker-1.5B delivers cross-domain reasoning that outperforms many larger open-source and commercial models:

Model	AIME25	LiveCodeBench v6	GPQA Diamond
VibeThinker-1.5B	74.4	51.1	46.7
GPT-OSS-20B-Medium	72.1	54.9	66.0
Claude Opus 4	69.2	56.6	79.6
MiniMax M1 (456B)	74.6	62.3	69.2
DeepSeek R1 (671B)	70.0	65.9	71.5
Kimi K2 (1.09T)	49.5	53.7	75.1

VibeThinker was compared with both reasoning-oriented models (Magistral, Claude, OpenAI o3-mini) and non-reasoning LLMs (GPT-4.1, Kimi K2, DeepSeek V3). In the structured reasoning benchmarks, the model consistently outperformed non-reasoning models regardless of size:

On AIME24 (maths) it defeated Kimi K2 (1.09T) by more than 10 points (80.3 vs. 69.6).
On LiveCodeBench v6 it surpassed Claude Opus 4 (51.1 vs. 47.4).
On GPQA it scored lower than GPT-4.1 and Claude, but still doubled the base model (from 16.4 to 46.7).

This supports the authors’ contention that size is not the only path to reasoning ability; with good training design, smaller models can match or even exceed the performance of much larger systems on targeted tasks.

Notably, it achieves parity with models hundreds of times larger in math and code, although it lags behind in general knowledge reasoning (GPQA), where larger models maintain an edge.

This suggests a potential trade-off between specializations: while VibeThinker excels at structured logic tasks, it has less capacity for broad encyclopedic recall, a known limitation of smaller architectures.

Guidance for Enterprise Adoption

The release includes recommended inference settings (temperature = 0.6, top_p = 0.95, max tokens = 40960).

The model is small enough to be deployed on edge devices, including mobile phones and in-vehicle systems, while inferencing costs are estimated to be 20 to 70 times cheaper than large models.

This positions VibeThinker-1.5B not only as a research achievement, but as a potential basis for cost-efficient, locally deployable reasoning systems.

Weibo’s strategy and market position

Launched by Sina Corporation in 2009, Weibo remains a cornerstone of China’s social media ecosystem. Often described as the Chinese version of X (formerly Twitter), the platform combines microblogging, multimedia content and trending topic features with a regulatory environment shaped by strict government oversight.

Despite having 600 million monthly active users (more than twice as many as investors are not optimistic about the growth potential of advertising revenues in the short term, and Weibo must deal with increasing competition from video-first platforms like Douyin, which are attracting younger users and spending their time elsewhere.

In response, Weibo has focused on monetizing the creator economy, livestreaming and vertical video, adding tools for influencer engagement, e-commerce integration and richer analytics for brands.

The platform’s role as a digital public square also makes it a focus of regulatory oversight. Chinese authorities continue to apply pressure on issues ranging from content management to data security. In September 2025, Weibo was among the platforms mentioned in official warningsunderscoring the continued exposure to policy risks.

Weibo’s push for AI R&D – exemplified by the release of VibeThinker-1.5B – signals a shift in ambition. In addition to being a media platform, Weibo is also positioning itself as a player in the next phase of China’s AI development, using its capital reserves, user behavior data and internal research capabilities to explore adjacent technical domains.

What it means for technical decision makers in enterprises

For technical leaders and enterprise AI teams, the release of VibeThinker has practical implications for everything from orchestration pipelines to cost modeling.

A 1.5 billion parameter model that outperforms 100x larger models in math and programming not only saves computing power, but also shifts the architectural balance. It enables LLM inference on limited infrastructure, reduces latency at the edge, and lowers the barrier to entry for applications that would otherwise have required API access to closed models at edge scale.

That’s important for ML business leaders trying to deploy reasoning agents within existing systems, or for platform owners who need to integrate LLMs into automated workflows.

It also appeals to those running reinforcement learning from human feedback (RLHF) pipelines or managing inference optimization in hybrid cloud environments.

The model’s post-training methodology – specifically its entropy-focused reinforcement learning approach – provides a roadmap for teams looking to refine smaller checkpoints rather than relying on large-scale pre-training.

VibeThinker’s transparency and data sanitization moves also address another emerging priority in enterprise AI: auditability. Although its performance on general knowledge tests still lags behind large frontier models, its task-specific reliability makes it an attractive candidate for controlled environments where correctness is more important than coverage.

In short, VibeThinker-1.5B is not just a research milestone, it is a strong candidate for practical business use, implementation and lessons. It suggests that a new class of compact, reasoning-optimized models is feasible for enterprise use cases that were previously the domain of much larger systems. For organizations trying to balance cost, latency, interpretability, and control, it’s a good new option for the long, growing list of Chinese open source offerings.

Source link

Weibo's new open source AI model VibeThinker-1.5B outperforms DeepSeek-R1 on $7,800 post-training budget