
At present, we’re releasing the secure model of Gemini 2.5 Flash-Lite, our quickest and lowest value ($0.10 enter per 1M, $0.40 output per 1M) mannequin within the Gemini 2.5 mannequin household. We constructed 2.5 Flash-Lite to push the frontier of intelligence per greenback, with native reasoning capabilities that may be optionally toggled on for extra demanding use instances. Constructing on the momentum of two.5 Professional and a couple of.5 Flash, this mannequin rounds out our set of two.5 fashions which might be prepared for scaled manufacturing use.
Our most cost-efficient and quickest 2.5 mannequin but
Gemini 2.5 Flash-Lite strikes a stability between efficiency and value, with out compromising on high quality, significantly for latency-sensitive duties like translation and classification.
Right here’s what makes it stand out:
- Greatest in-class velocity: Gemini 2.5 Flash-Lite has decrease latency than each 2.0 Flash-Lite and a couple of.0 Flash on a broad pattern of prompts.
- Price-efficiency: It’s our lowest-cost 2.5 mannequin but, priced at $0.10 / 1M enter tokens and $0.40 output tokens, permitting you to deal with massive volumes of requests affordably. Now we have additionally diminished audio enter pricing by 40% from the preview launch.
- Good and small: It demonstrates all-around larger high quality than 2.0 Flash-Lite throughout a variety of benchmarks, together with coding, math, science, reasoning, and multimodal understanding.
- Totally featured: If you construct with 2.5 Flash-Lite, you get entry to a 1 million-token context window, controllable considering budgets, and assist for native instruments like Grounding with Google Search, Code Execution, and URL Context.
Gemini 2.5 Flash-Lite in motion
For the reason that launch of two.5 Flash-Lite, we have now already seen some extremely profitable deployments, listed below are a few of our favorites:
- Satlyt is constructing a decentralized area computing platform that can remodel how satellite tv for pc information is processed and utilized for real-time summarization of in-orbit telemetry, autonomous job administration, and satellite-to-satellite communication parsing. 2.5 Flash-Lite’s velocity has enabled a forty five% discount in latency for vital onboard diagnostics and a 30% lower in energy consumption in comparison with their baseline fashions.
- HeyGen makes use of AI to create avatars for video content material and leverages Gemini 2.5 Flash-Lite to automate video planning, analyze and optimize content material, and translate movies into over 180 languages. This permits them to supply world, customized experiences for his or her customers.
- DocsHound turns product demos into documentation through the use of Gemini 2.5 Flash-Lite to course of lengthy movies and extract 1000’s of screenshots with low latency. This transforms footage into complete documentation and coaching information for AI brokers a lot sooner than conventional strategies.
- Evertune helps manufacturers perceive how they’re represented throughout AI fashions. Gemini 2.5 Flash-Lite is a game-changer for them, dramatically dashing up evaluation and report technology. Its quick efficiency permits them to rapidly scan and synthesize massive volumes of mannequin output to supply purchasers with dynamic, well timed insights.
You can begin utilizing 2.5 Flash-Lite by specifying “gemini-2.5-flash-lite” in your code. If you’re utilizing the preview model, you’ll be able to change to “gemini-2.5-flash-lite” which is similar underlying mannequin. We plan to take away the preview alias of Flash-Lite on August twenty fifth.
Prepared to start out constructing? Strive the secure model of Gemini 2.5 Flash-Lite now in Google AI Studio and Vertex AI.