At this time, we’re releasing the steady model of Gemini 2.5 Flash-Lite, our quickest and lowest value ($0.10 enter per 1M, $0.40 output per 1M) mannequin within the Gemini 2.5 mannequin household. We constructed 2.5 Flash-Lite to push the frontier of intelligence per greenback, with native reasoning capabilities that may be optionally toggled on for extra demanding use instances. Constructing on the momentum of two.5 Professional and a couple of.5 Flash, this mannequin rounds out our set of two.5 fashions which might be prepared for scaled manufacturing use.
Our most cost-efficient and quickest 2.5 mannequin but
Gemini 2.5 Flash-Lite strikes a steadiness between efficiency and value, with out compromising on high quality, notably for latency-sensitive duties like translation and classification.
Right here’s what makes it stand out:
- Greatest in-class velocity: Gemini 2.5 Flash-Lite has decrease latency than each 2.0 Flash-Lite and a couple of.0 Flash on a broad pattern of prompts.
- Price-efficiency: It’s our lowest-cost 2.5 mannequin but, priced at $0.10 / 1M enter tokens and $0.40 output tokens, permitting you to deal with massive volumes of requests affordably. Now we have additionally diminished audio enter pricing by 40% from the preview launch.
- Good and small: It demonstrates all-around larger high quality than 2.0 Flash-Lite throughout a variety of benchmarks, together with coding, math, science, reasoning, and multimodal understanding.
- Totally featured: Once you construct with 2.5 Flash-Lite, you get entry to a 1 million-token context window, controllable pondering budgets, and help for native instruments like Grounding with Google Search, Code Execution, and URL Context.
Gemini 2.5 Flash-Lite in motion
For the reason that launch of two.5 Flash-Lite, we now have already seen some extremely profitable deployments, listed here are a few of our favorites:
- Satlyt is constructing a decentralized area computing platform that can rework how satellite tv for pc information is processed and utilized for real-time summarization of in-orbit telemetry, autonomous job administration, and satellite-to-satellite communication parsing. 2.5 Flash-Lite’s velocity has enabled a forty five% discount in latency for essential onboard diagnostics and a 30% lower in energy consumption in comparison with their baseline fashions.
- HeyGen makes use of AI to create avatars for video content material and leverages Gemini 2.5 Flash-Lite to automate video planning, analyze and optimize content material, and translate movies into over 180 languages. This permits them to supply world, personalised experiences for his or her customers.
- DocsHound turns product demos into documentation through the use of Gemini 2.5 Flash-Lite to course of lengthy movies and extract hundreds of screenshots with low latency. This transforms footage into complete documentation and coaching information for AI brokers a lot sooner than conventional strategies.
- Evertune helps manufacturers perceive how they’re represented throughout AI fashions. Gemini 2.5 Flash-Lite is a game-changer for them, dramatically rushing up evaluation and report technology. Its quick efficiency permits them to rapidly scan and synthesize massive volumes of mannequin output to supply purchasers with dynamic, well timed insights.
You can begin utilizing 2.5 Flash-Lite by specifying “gemini-2.5-flash-lite” in your code. In case you are utilizing the preview model, you may change to “gemini-2.5-flash-lite” which is identical underlying mannequin. We plan to take away the preview alias of Flash-Lite on August twenty fifth.
Prepared to start out constructing? Attempt the steady model of Gemini 2.5 Flash-Lite now in Google AI Studio and Vertex AI.









