Recently, the AI community was in an uproar. A former employee of Mistral, a prominent European startup, alleged that its latest model was plagiarized by the Chinese company DeepSeek, but was being marketed as a success story in reinforcement learning (RL). This star company, known as the "European OpenAI," saw its valuation soar to tens of billions of dollars in just one year, but is now embroiled in an "academic misconduct" controversy, causing a stir in the industry.
"Language fingerprint" reveals flaws; model similarities are beyond imagination.
As early as June, a tech blogger, using "language fingerprint" analysis, discovered that Mistral's small model, Mistral-small-3.2, and DeepSeek-v3, had highly similar output styles—for example, a shared preference for using certain uncommon words or unusual collocations. This similarity doesn't typically emerge out of thin air; it's more likely to be a "learning relationship" between the models.
The whistleblower further suggested that Mistral may have directly "borrowed" DeepSeek's training results through a technique called "distillation." Distillation itself isn't illegal, but the problem is that Mistral not only failed to disclose its process but also deliberately exaggerated the reinforcement learning performance of its model, even distorting benchmark data to mislead the public. One netizen quipped, "DeepSeek was once called the 'Chinese version of Mistral,' but now it's the 'European version of DeepSeek'?"
A crisis of trust? The open source community's fig leaf has been lifted.
Mistral has long prided itself on being an "open source pioneer," with its models demonstrating impressive performance in reasoning and multilingual processing, even being considered a formidable competitor to OpenAI. However, this incident has led many to question: If even high-profile companies are cutting corners, how much transparency remains in the open source community?
Even more embarrassingly, the leaker is a former Mistral employee, providing even more damaging information. Although no official response has been released, the controversy has already impacted the company's reputation—after all, competition in the AI industry relies not only on technology but also on trust. One developer bluntly stated, "Distillation is fine, but the source must be cited. Otherwise, what's the difference between this and plagiarism?"
Mistral has just released a new model, Mistral Medium V3.1, in an attempt to deflect attention. But this controversy may not be easily quelled—after all, once the halo of the "European version of OpenAI" fades, how much patience will the market have for it?