OpenAI Unveils Advanced Audio Models for Developers

OV News DeskMarch 23, 2025Last Updated: March 23, 2025

2 minutes read

OpenAI Unveils Advanced Audio Models for Developersv — photo : geeky gadgets

OpenAI has launched new audio models in its application programming interface (API), enhancing performance in both speech-to-text transcription and text-to-speech (TTS) functions. The San Francisco-based AI company introduced three innovative models designed to empower developers in creating applications with sophisticated workflows. These advancements are expected to streamline customer support operations and improve overall user experience.

New Audio Models Enhance Performance

In a recent blog post, OpenAI outlined the features of its new API-specific audio models. The company emphasized its history of developing AI agents, including Operator, Deep Research, and the Responses API, which incorporate built-in tools. However, OpenAI noted that the full potential of these agents can only be realized when they operate intuitively and interact across various mediums beyond text.

The newly introduced models include GPT-4o-transcribe and GPT-4o-mini-transcribe for speech-to-text tasks, alongside the GPT-4o-mini-tts for text-to-speech applications. OpenAI asserts that these models surpass the performance of its previous Whisper models released in 2022. Unlike their predecessors, the new models are not open-source, which may affect accessibility for some developers.

Specifically, the GPT-4o-transcribe model demonstrates improved performance in “word error rate” (WER) as evaluated by the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) benchmark, which assesses multilingual speech across 100 languages. OpenAI attributes these enhancements to targeted training techniques, including reinforcement learning and extensive midtraining with high-quality audio datasets.

Robust Features for Diverse Applications

The new speech-to-text models are designed to excel in challenging environments, effectively capturing audio even with heavy accents, background noise, and varying speech speeds. This capability is crucial for applications that require high accuracy in transcription, such as customer service and content creation.

Similarly, the GPT-4o-mini-tts model boasts significant advancements, allowing for customizable inflections, intonations, and emotional expressiveness. This feature enables developers to create applications suitable for a wide range of tasks, from customer support to creative storytelling. However, it is important to note that the model currently offers only artificial and preset voices.

Pricing and Availability

OpenAI has detailed the pricing structure for its new audio models on its API pricing page. The GPT-4o-based audio model is priced at $40 (approximately Rs. 3,440) per million input tokens and $80 (around Rs. 6,880) per million output tokens. In contrast, the GPT-4o mini-based audio models are available at a lower rate of $10 (about Rs. 860) per million input tokens and $20 (approximately Rs. 1,720) per million output tokens.

All audio models are now accessible to developers via the API. Additionally, OpenAI is launching an integration with its Agents software development kit (SDK) to assist users in building voice agents, further expanding the capabilities of its AI offerings.

Observer Voice is the one stop site for National, International news, Sports, Editor’s Choice, Art/culture contents, Quotes and much more. We also cover historical contents. Historical contents includes World History, Indian History, and what happened today. The website also covers Entertainment across the India and World.

Follow Us on Twitter, Instagram, Facebook, & LinkedIn

OpenAI Unveils Advanced Audio Models for Developers

New Audio Models Enhance Performance

Robust Features for Diverse Applications

Pricing and Availability

OV News Desk

Read Next

The iPhone 17 Pro: A Game-Changing Must-Have for AI Enthusiasts

Bluesky to Implement Age-Verification Compliance in South Dakota

Bill Gates’s Fellowship Program Innovates Amid Global Uncertainty

Box CEO Aaron Levie Discusses AI’s Contextual Revolution

Google to Sunset Tables, Competing with Airtable

The iPhone 17 Pro: A Game-Changing Must-Have for AI Enthusiasts

Bluesky to Implement Age-Verification Compliance in South Dakota

Bill Gates’s Fellowship Program Innovates Amid Global Uncertainty

Box CEO Aaron Levie Discusses AI’s Contextual Revolution

Google to Sunset Tables, Competing with Airtable

The Science Behind Aloe Vera: A Modern Look at an Ancient Healing Plant

George Paget Thomson: Probing the Mysteries of Quantum Mechanics

The story of the foolish heron, the black snake and the mungoose

The story of Dharmabuddhi and Papabuddhi

The story of the sparrow and the monkey

The Monkey and Suchimukha: A Panchatantra Tale

Charles Addams: Mastermind Behind The Addams Family

Celebrating Life and Legacy of Ladislao José Biro

Balamani Amma: Poetess of Grace and Strength in Malayalam Literature

Unveiling the Secrets to a Fulfilling Life with Mihály Csíkszentmihályi

IPL Chairman Addresses Fans Amidst Operation Sindoor During India vs Pakistan Match

Mike Hesson Confident as Pakistan Readies for Kuldeep Yadav and Varun Chakravarthy Challenge in Asia Cup

Bangladesh Takes on Hong Kong in Today’s Asia Cup Clash

Exploring the Reasons Behind the Absence of an India vs Pakistan Final in the Asia Cup

India Dominates UAE with a Nine-Wicket Victory in Asia Cup 2025: Highlights and Insights

New Audio Models Enhance Performance

Robust Features for Diverse Applications

Pricing and Availability

OV News Desk

Read Next

The iPhone 17 Pro: A Game-Changing Must-Have for AI Enthusiasts

Bluesky to Implement Age-Verification Compliance in South Dakota

Bill Gates’s Fellowship Program Innovates Amid Global Uncertainty

Box CEO Aaron Levie Discusses AI’s Contextual Revolution

Google to Sunset Tables, Competing with Airtable

The iPhone 17 Pro: A Game-Changing Must-Have for AI Enthusiasts

Bluesky to Implement Age-Verification Compliance in South Dakota

Bill Gates’s Fellowship Program Innovates Amid Global Uncertainty

Box CEO Aaron Levie Discusses AI’s Contextual Revolution

Google to Sunset Tables, Competing with Airtable

Daily Observer Voice Newsletter