Microsoft recently achieved a major breakthrough in AI, open-sourcing an intelligent reasoning model called rStar2-Agent. Using only 14 billion parameters, this model achieved an 80.6% accuracy on the AIME24 mathematical reasoning test, significantly surpassing the 79.8% accuracy of DeepSeek-R1, which boasts 671 billion parameters. This performance defies the conventional wisdom that parameter size determines performance.
Even more impressively, rStar2-Agent demonstrated comprehensive advantages across multiple domains. In the GPQA-Diamond scientific reasoning test, it achieved an accuracy of 60.9%, surpassing DeepSeek-V3's 59.1%. In the BFCL v3 tool usage task, its 60.8% completion rate also surpassed its competitor's 57.6%. These data demonstrate the model's excellent cross-task generalization capabilities.
Key to this breakthrough was Microsoft's technological innovation. First, they developed an efficient, isolated code execution service that supports 45,000 concurrent tool calls per second with an average latency of just 0.3 seconds. Secondly, the innovative GRPO-RoC algorithm significantly improves inference efficiency by optimizing its reward mechanism. Finally, the training process of "non-inference fine-tuning + multi-stage reinforcement learning" ensures a step-by-step improvement in model capabilities.
This achievement not only opens up new directions for AI agent research but also suggests that future AI development may no longer rely on parameter stacking, but instead achieve qualitative leaps through algorithm optimization. With the open source release of rStar2-Agent, the entire industry will usher in a new round of technological innovation.