Google DeepMind has reportedly released a new Gemini 2.5 Flash image editing model. Within the Gemini app, this model can improve the accuracy of image editing based on text commands while maintaining the consistent appearance of people and animals.
It reportedly achieves higher accuracy in text-based image editing than previous native image generation tools, even outperforming GPT-4o, the algorithm used by ChatGPT, in multiple tasks, enabling it to better edit images based on complex text.
A key feature of the new model is its "character consistency" feature. After generating multiple images, it maintains the consistent appearance of the same person, animal, or object, even when the pose, background, or lighting changes. This is particularly valuable for creating photo series and product displays from multiple angles, making it suitable for mass production of brand materials and catalogs.
Gemini 2.5 Flash supports precise localized text editing, allowing users to perform operations such as blurring the background, removing blemishes, adding color, or removing objects without manual selection. The model can also fuse up to three images at once, for example, combining product photos with interior photos to create realistic scenes. It also supports "style transfer," applying one texture, color, or pattern to another while preserving shape and detail. Realistic reasoning features can simulate simple cause and effect, such as generating an image of a balloon flying towards a cactus and the subsequent consequences.
Gemini 2.5 Flash is now available in the Gemini app. Users must switch the model to "Flash" to access image editing features. Generated images are marked with both a visible watermark and an invisible SynthID digital watermark. Developers can try it out through the Gemini API, Google AI Studio, and Vertex AI. The cost is $30 per million output tokens, or approximately $0.039 per image.