ant bailing big model team recently announced the release of two new high-efficiency inference models (ring-flash-linear-2.0 and ring-mini-linear-2.0) designed specifically to improve the inference efficiency of deep learning as open-source. in addition, two high-performance fusion operators independently developed (fp8 fusion operator and linear attention inference fusion operator) were also released. these operators support efficient inference with large parameters and low activation numbers, and can handle very long contexts.
thanks to the synergistic effect of architecture optimization and high-performance operators, these new models achieve only one-tenth of the cost compared with high-density models of the same scale in deep learning scenarios. this corresponds to a reduction of more than 50% compared with the previous generation of the ring series. this means that users can greatly reduce the consumption of computing resources when running complex inferences and improve efficiency. furthermore, by closely integrating the operators of the training engine and the inference engine, models can be optimized stably over a long period of time during reinforcement learning, achieving cutting-edge performance in multiple high-difficulty inference rankings.
both models are currently open-sourced on platforms such as hugging face and modelscope, allowing developers to access and experiment with the models. this open-sourcing not only demonstrates ant financial's technological capabilities in the ai field, but also is expected to promote further breakthroughs in ai research and application by providing developers with efficient tools.