Currently, the official app, website, mini-program, and DeepSeek API models have all been updated to DeepSeek-V3.1-Terminus. However, reporters noticed that this large model is called Terminus, meaning "ultimate edition," suggesting this may be the final update for V3.1. The public is waiting to see whether the next major update will be V4 or R2.
In public benchmark results, V3.1-Terminus shows an overall improvement over V3.1, though some scores have declined. However, the progress on the "Humanity's Last Exam" benchmark is particularly notable, with a significant increase from 15.9 to 21.7. According to official website data, this score is second only to Grok 4 (25.4) and GPT-5 (25.3), and slightly surpasses Gemini 2.5 Pro (21.6).
Notably, DeepSeek's improvements in mixed Chinese and English are particularly welcome. The Paper reporters saw many users give thumbs-up comments on social media: "The problem of Chinese and English mixing does occur when thinking for a long time. I've encountered it several times and was still wondering what the problem was, and now it's solved."
Industry insiders stated that the DeepSeek-V3.1-Terminus update focuses on engineering implementation and scenario adaptation. Its core breakthroughs are reflected in two key competitive improvements: First, semantic-layer noise reduction technology significantly improves language consistency, effectively suppressing interference such as Chinese and English mixing and abnormal characters, and improving the purity of text generation; second, a deep reconstruction of the agent execution framework, specifically optimizing the code agent's syntax parsing accuracy and the search agent's information retrieval recall, improves the stability of the agent's output.