Gemini API I/O updates – Google Builders Weblog

The Gemini API affords builders a streamlined strategy to construct modern functions with cutting-edge generative AI fashions. Google AI Studio simplifies this course of of testing all of the API capabilities permitting for fast prototyping and experimentation with textual content, picture, and even video prompts. When builders need to take a look at and construct at scale they will leverage all of the capabilities out there by the Gemini API.

New fashions out there by the API

Gemini 2.5 Flash Preview – We’ve added a brand new 2.5 Flash preview (gemini-2.5-flash-preview-05-20) which is healthier over the earlier preview at reasoning, code, and lengthy context. This model of two.5 Flash is at present #2 on the LMarena leaderboard behind solely 2.5 Professional. We’ve additionally improved Flash cost-efficiency with this newest replace decreasing the variety of tokens wanted for a similar efficiency, leading to 22% effectivity features on our evals. Our purpose is to maintain bettering primarily based in your suggestions, and make each typically out there quickly.

Gemini 2.5 Professional and Flash text-to-speech (TTS) – We additionally introduced 2.5 Professional and Flash previews for text-to-speech (TTS) that help native audio output for each single and a number of audio system, throughout 24 languages. With these fashions, you’ll be able to management TTS expression and magnificence, creating wealthy audio output. With multispeaker, you’ll be able to generate conversations with a number of distinct voices for dynamic interactions.

Gemini 2.5 Flash native audio dialog – In preview, this mannequin is accessible through the Dwell API to generate pure sounding voices for dialog, in over 30 distinct voices and 24+ languages. We’ve additionally added proactive audio so the mannequin can distinguish between the speaker and background conversations, so it is aware of when to reply. As well as, the mannequin responds appropriately to a person’s emotional expression and tone. A separate considering mannequin allows extra complicated queries. This now makes it doable so that you can construct conversational AI brokers and experiences that really feel extra intuitive and pure, like enhancing name middle interactions, growing dynamic personas, crafting distinctive voice characters, and extra.

Lyria RealTime – Dwell music era is now out there within the Gemini API and Google AI Studio to create a steady stream of instrumental music utilizing textual content prompts. With Lyria RealTime, we use WebSockets to ascertain a persistent, real-time communication channel. The mannequin repeatedly produces music in small, flowing chunks and adapts primarily based on inputs. Think about including a responsive soundtrack to your app or designing a brand new sort of musical instrument! Check out Lyria RealTime with the PromptDJ-MIDI app in Google AI Studio.

Gemini 2.5 Professional Deep Assume – We’re additionally testing an experimental reasoning mode for two.5 Professional. We’ve seen unimaginable efficiency with these Deep Considering capabilities for extremely complicated math and coding prompts. We look ahead to making it broadly out there so that you can experiment with quickly.

Gemma 3n – Gemma 3n is a generative AI open mannequin optimized to be used in on a regular basis gadgets, corresponding to telephones, laptops, and tablets. It will probably deal with textual content, audio and imaginative and prescient inputs. This mannequin consists of improvements in parameter-efficient processing, together with Per-Layer Embedding (PLE) parameter caching and a MatFormer mannequin structure that gives the pliability to scale back compute and reminiscence necessities.

New performance within the API

Thought summaries

To assist builders perceive and debug mannequin responses, we’ve added thought summaries for two.5 Professional and Flash within the Gemini API. We take the mannequin’s uncooked ideas and synthesize them right into a useful abstract with headers, related particulars and power calls. The uncooked chain-of-thoughts in Google AI Studio has additionally been up to date with the brand new thought summaries.

Considering budgets

We launched 2.5 Flash with considering budgets to supply builders management over how a lot fashions assume to steadiness efficiency, latency, and price for the apps they’re constructing. We shall be extending this functionality to 2.5 Professional quickly.

from google import genai
from google.genai import sorts

consumer = genai.Shopper(api_key="GOOGLE_API_KEY")
immediate = "What's the sum of the primary 50 prime numbers?"
response = consumer.fashions.generate_content(
  mannequin="gemini-2.5-flash-preview-05-20",
  contents=immediate,
  config=sorts.GenerateContentConfig(
    thinking_config=sorts.ThinkingConfig(thinking_budget=1024,
      include_thoughts=True
    )
  )
)

for half in response.candidates[0].content material.components:
  if not half.textual content:
    proceed
  if half.thought:
    print("Thought abstract:")
    print(half.textual content)
    print()
  else:
    print("Reply:")
    print(half.textual content)
    print()

Python

Pattern code to allow and retrieve thought summaries with out streaming, returning a closing thought abstract with the response.

New URL Context software

We added a brand new experimental software, URL context, to retrieve extra context from hyperlinks that you simply present. This can be utilized by itself or along with different instruments corresponding to Grounding with Google Search. This software is a key constructing block for builders seeking to construct their very own model of analysis brokers with the Gemini API.

from google import genai
from google.genai.sorts import Device, GenerateContentConfig, GoogleSearch

consumer = genai.Shopper()
model_id = "gemini-2.5-flash-preview-05-20"

instruments = []
instruments.append(Device(url_context=sorts.UrlContext))
instruments.append(Device(google_search=sorts.GoogleSearch))

response = consumer.fashions.generate_content(
    mannequin=model_id,
    contents="Give me three day occasions schedule primarily based on YOUR_URL. Additionally let me know what must taken care of contemplating climate and commute.",
    config=GenerateContentConfig(
        instruments=instruments,
        response_modalities=["TEXT"],
    )
)

for every in response.candidates[0].content material.components:
    print(every.textual content)
# get URLs retrieved for context
print(response.candidates[0].url_context_metadata)

Python

Pattern code for Grounding with Google Search and URL Context

Laptop use software

We’re bringing Undertaking Mariner’s browser management capabilities to the Gemini API through a brand new laptop use software. To make it simpler for builders to make use of this software, we’re enabling the creation of Cloud Run situations optimally configured for working browser management brokers through one click on from Google AI Studio. We’ve begun early testing with corporations like Automation Wherever, UiPath and Browserbase. Their priceless suggestions shall be instrumental in refining its capabilities for a broader experimental developer launch this summer time.

Enhancements to structured outputs

The Gemini API now has broader help for JSON Schema, together with much-requested key phrases corresponding to “$ref” (for references) and people enabling the definition of tuple-like buildings (e.g., prefixItems).

Video understanding enhancements

The Gemini API now permits YouTube video URLs or video uploads to be added to a immediate, enabling customers to to summarize, translate, or analyze the video content material. With this latest replace, the API helps video clipping, enabling flexibility in analyzing particular components of a video. That is notably useful for movies longer than 8 hours. We’ve got additionally added help for dynamic frames per second (FPS), permitting 60 FPS for movies like video games or sports activities the place pace is vital, and 0.1 FPS for movies the place pace is much less of a precedence. To assist customers save tokens, we’ve got additionally launched help for 3 totally different video resolutions: excessive (720p), commonplace (480p), and low (360p).

Async operate calling

The cascaded structure within the Dwell API now helps asynchronous operate calling, guaranteeing person conversations stay easy and uninterrupted. This implies your Dwell agent can proceed producing responses even whereas it is busy executing capabilities within the background, by merely including the habits discipline to the operate definition and setting it to NON-BLOCKING. Learn extra about this within the Gemini API developer documentation.

Batch API

We’re additionally testing a brand new API, which helps you to simply batch up your requests and get them again in a max 24 hour turnaround time. The API will come at half the value of the interactive API and with a lot larger price limits. We hope to roll that out extra broadly later this summer time.

Begin constructing

That’s a wrap on I/O for this yr! With the Gemini API and Google AI Studio, you’ll be able to flip your concepts into actuality, whether or not you are constructing conversational AI brokers with natural-sounding audio or growing instruments to research and generate code. As at all times, take a look at the Gemini API developer docs for all the most recent code samples and extra.

Discover this announcement and all Google I/O 2025 updates on io.google.