Immediately, we’re releasing the secure model of Gemini 2.5 Flash-Lite, our quickest and lowest value ($0.10 enter per 1M, $0.40 output per 1M) mannequin within the Gemini 2.5 mannequin household. We constructed 2.5 Flash-Lite to push the frontier of intelligence per greenback, with native reasoning capabilities that may be optionally toggled on for extra demanding use circumstances. Constructing on the momentum of two.5 Professional and a pair of.5 Flash, this mannequin rounds out our set of two.5 fashions which can be prepared for scaled manufacturing use.
Our most cost-efficient and quickest 2.5 mannequin but
Gemini 2.5 Flash-Lite strikes a steadiness between efficiency and price, with out compromising on high quality, significantly for latency-sensitive duties like translation and classification.
Right here’s what makes it stand out:
- Greatest in-class pace: Gemini 2.5 Flash-Lite has decrease latency than each 2.0 Flash-Lite and a pair of.0 Flash on a broad pattern of prompts.
- Value-efficiency: It’s our lowest-cost 2.5 mannequin but, priced at $0.10 / 1M enter tokens and $0.40 output tokens, permitting you to deal with giant volumes of requests affordably. We’ve additionally decreased audio enter pricing by 40% from the preview launch.
- Sensible and small: It demonstrates all-around larger high quality than 2.0 Flash-Lite throughout a variety of benchmarks, together with coding, math, science, reasoning, and multimodal understanding.
- Absolutely featured: If you construct with 2.5 Flash-Lite, you get entry to a 1 million-token context window, controllable considering budgets, and help for native instruments like Grounding with Google Search, Code Execution, and URL Context.
Gemini 2.5 Flash-Lite in motion
Because the launch of two.5 Flash-Lite, we’ve got already seen some extremely profitable deployments, listed here are a few of our favorites:
- Satlyt is constructing a decentralized area computing platform that may remodel how satellite tv for pc knowledge is processed and utilized for real-time summarization of in-orbit telemetry, autonomous activity administration, and satellite-to-satellite communication parsing. 2.5 Flash-Lite’s pace has enabled a forty five% discount in latency for important onboard diagnostics and a 30% lower in energy consumption in comparison with their baseline fashions.
- HeyGen makes use of AI to create avatars for video content material and leverages Gemini 2.5 Flash-Lite to automate video planning, analyze and optimize content material, and translate movies into over 180 languages. This enables them to offer world, customized experiences for his or her customers.
- DocsHound turns product demos into documentation by utilizing Gemini 2.5 Flash-Lite to course of lengthy movies and extract hundreds of screenshots with low latency. This transforms footage into complete documentation and coaching knowledge for AI brokers a lot quicker than conventional strategies.
- Evertune helps manufacturers perceive how they’re represented throughout AI fashions. Gemini 2.5 Flash-Lite is a game-changer for them, dramatically dashing up evaluation and report era. Its quick efficiency permits them to shortly scan and synthesize giant volumes of mannequin output to offer purchasers with dynamic, well timed insights.
You can begin utilizing 2.5 Flash-Lite by specifying “gemini-2.5-flash-lite” in your code. In case you are utilizing the preview model, you’ll be able to swap to “gemini-2.5-flash-lite” which is identical underlying mannequin. We plan to take away the preview alias of Flash-Lite on August twenty fifth.
Prepared to begin constructing? Strive the secure model of Gemini 2.5 Flash-Lite now in Google AI Studio and Vertex AI.