According to the official announcement, DeepSeek-V3.1's API pricing has been adjusted. The input price for a cache hit is as low as 0.5 yuan per million tokens, while the input price for a cache miss is 4 yuan per million tokens. The output price is now a unified 12 yuan per million tokens. Compared to similar models in the industry, DeepSeek offers a more cost-effective option while maintaining high performance. This move is expected to significantly reduce costs for developers and businesses, promoting the wider adoption of AI technology.
DeepSeek-V3.1 introduces a new hybrid inference architecture, enabling the same model to support both thinking and non-thinking modes. On the official app and website, users can freely switch between these two modes using the "Deep Thinking" button, allowing them to choose different response methods based on task requirements.
Compared to the previously widely used DeepSeek-R1-0528, DeepSeek-V3.1 significantly improves inference efficiency in thinking mode, enabling it to deliver answers in a shorter time. Furthermore, through post-training optimization, the new model significantly improves its performance in tool usage and agent tasks. Compared to the previous generation model (R1-0528), while maintaining roughly the same task performance, token consumption can be reduced by 20% to 50%.
CNMO understands that as of June 2025, DeepSeek has reached 163 million monthly active users, making it the world's largest AIGC application. With the release of DeepSeek-V3.1, competition in the large-scale AI model field is expected to intensify.