Google Launches Gemini 2.5 with Native Audio Dialog Access

OV News DeskJune 5, 2025Last Updated: June 27, 2025

2 minutes read

Google Launches Gemini 2.5 with Native Audio Dialog Access — PHOTO CREDIT : GOOGLE

Google has unveiled exciting new audio generation features as part of its Gemini 2.5 models during the Google I/O 2025 event. The tech giant is now allowing developers and users to experiment with these capabilities on its platform. The two standout features include native audio dialog, which generates human-like audio responses, and a controllable text-to-speech (TTS) function that transforms scripts into conversational speech. However, these features are currently not accessible to developers through application programming interfaces (APIs).

Exploring Gemini 2.5 Flash’s Audio Features

In a recent blog post, Google elaborated on the innovative audio generation capabilities of the Gemini 2.5 Flash models. These features are designed to enhance user experiences by enabling developers to create more interactive applications. Users can explore the native audio dialog feature in the stream tab of Google AI Studio, while the TTS functionality is available in the generate media tab.

The native audio dialog allows for real-time interactions between users and the AI. Users can either type or verbally express their prompts, and the AI responds with generated audio. This direct audio generation process eliminates the need for an intermediate text phase, resulting in a more fluid conversation. The system is capable of recognizing the emotional tone of the user’s voice, enabling it to respond appropriately to feelings such as fear, anger, or surprise.

Capabilities of Controllable Text-to-Speech

The controllable TTS feature offers a range of functionalities that enhance the quality of audio output. It can generate multi-speaker dialogues and infuse emotions and accents into the narration of scripts. Additionally, users can control the delivery speed and emphasize specific pronunciations, making the audio output more engaging and relatable. This feature also supports the same 24 languages as the native audio dialog, allowing for language mixing and diverse communication styles.

Google emphasizes that these audio generation capabilities have undergone thorough risk assessments throughout their development. The company employed both internal mechanisms and red teaming strategies to identify and address any potential vulnerabilities. Furthermore, all audio outputs generated by these models are embedded with SynthID, Google’s watermarking technology, ensuring authenticity and traceability.

Implications for Developers and Users

The introduction of these audio generation features marks a significant advancement in AI technology, providing developers with powerful tools to create more immersive experiences. By leveraging the capabilities of Gemini 2.5, developers can build applications that engage users in more meaningful ways. The ability to generate human-like audio responses and control speech delivery opens up new possibilities for interactive storytelling, virtual assistants, and customer service applications.

As these features are still in the testing phase, developers are encouraged to explore their potential within Google AI Studio. The feedback gathered during this testing period will be crucial for refining the technology and ensuring it meets user needs. With the integration of advanced audio generation capabilities, Google is poised to lead the way in transforming how users interact with AI, making conversations more natural and intuitive.

Observer Voice is the one stop site for National, International news, Sports, Editor’s Choice, Art/culture contents, Quotes and much more. We also cover historical contents. Historical contents includes World History, Indian History, and what happened today. The website also covers Entertainment across the India and World.

Follow Us on Twitter, Instagram, Facebook, & LinkedIn

Google Launches Gemini 2.5 with Native Audio Dialog Access

Exploring Gemini 2.5 Flash’s Audio Features

Capabilities of Controllable Text-to-Speech

Implications for Developers and Users

OV News Desk

Read Next

5 Reasons Why Ludo Is Still One of the Most Popular Board Games

Everything You Need to Know About Online Train Ticket Booking in India

Modern Pan Commode Ideas to Upgrade Your Bathroom Style

Why Brand Reputation in 2026 Means More Than What Shows Up on Page One

Are AI Boyfriends and Girlfriends Creating a New Definition of Infidelity? LeapHope Insights

5 Reasons Why Ludo Is Still One of the Most Popular Board Games

Everything You Need to Know About Online Train Ticket Booking in India

Modern Pan Commode Ideas to Upgrade Your Bathroom Style

Why Brand Reputation in 2026 Means More Than What Shows Up on Page One

Are AI Boyfriends and Girlfriends Creating a New Definition of Infidelity? LeapHope Insights

Niti Aayog Unveils Strategic Roadmap to Transform Agriculture

India’s Currency Set for a Plastic Transformation

Pakistan Implements Daily Fuel Price Revisions as Current Account Faces $139 Million Deficit in FY26

Can AI Transform Customer Care as Consumers Demand Immediate Support?

LPG Subsidy Expenditure Surges Past Budget Projections, Could Reach Rs 1 Lakh Crore This Fiscal Year

Strait of Hormuz Closure: Understanding the Stability of Crude Oil Prices Amid US-Iran Tensions

India’s Current Account Deficit Expected to Expand to 1.5% of GDP by FY27 Amid Rising Oil Prices

PM Modi Launches India’s First Hydrogen-Powered Train: Route, Timings, and Operational Insights

Crude Oil Set for Significant Weekly Increase as Middle East Crisis Escalates

Wipro’s Q1 Revenue Declines 1.4% to $2.6 Billion, Lagging Behind Competitors

Today’s School Assembly News Headlines (13 July)

Today’s School Assembly News Headlines (10 July)

Today’s School Assembly News Headlines (09 July)

Today’s School Assembly News Headlines (07 July)

Today’s School Assembly News Headlines (06 July)

Exploring Gemini 2.5 Flash’s Audio Features

Capabilities of Controllable Text-to-Speech

Implications for Developers and Users

OV News Desk

Read Next

5 Reasons Why Ludo Is Still One of the Most Popular Board Games

Everything You Need to Know About Online Train Ticket Booking in India

Modern Pan Commode Ideas to Upgrade Your Bathroom Style

Why Brand Reputation in 2026 Means More Than What Shows Up on Page One

Are AI Boyfriends and Girlfriends Creating a New Definition of Infidelity? LeapHope Insights

5 Reasons Why Ludo Is Still One of the Most Popular Board Games

Everything You Need to Know About Online Train Ticket Booking in India

Modern Pan Commode Ideas to Upgrade Your Bathroom Style

Why Brand Reputation in 2026 Means More Than What Shows Up on Page One

Are AI Boyfriends and Girlfriends Creating a New Definition of Infidelity? LeapHope Insights

Daily Observer Voice Newsletter