10 Minutes News for Hoteliers 10 Minutes News for Hoteliers
  • Top News
  • Posts
    • CSR and Sustainability
    • Events
    • Hotel Openings
    • Hotel Operations
    • Human Resources
    • Innovation
    • Market Trends
    • Marketing
    • Mergers & Acquisitions
    • Regulatory and Legal Affairs
    • Revenue Management
  • 🎙️ Podcast
  • 👉 Sign-up
  • 🌎 Languages
    • 🇫🇷 French
    • 🇩🇪 German
    • 🇮🇹 Italian
    • 🇪🇸 Spain
  • 📰 Columns
  • About us
10 Minutes News for Hoteliers 10 Minutes News for Hoteliers
  • Top News
  • Posts
    • CSR and Sustainability
    • Events
    • Hotel Openings
    • Hotel Operations
    • Human Resources
    • Innovation
    • Market Trends
    • Marketing
    • Mergers & Acquisitions
    • Regulatory and Legal Affairs
    • Revenue Management
  • 🎙️ Podcast
  • 👉 Sign-up
  • 🌎 Languages
    • 🇫🇷 French
    • 🇩🇪 German
    • 🇮🇹 Italian
    • 🇪🇸 Spain
  • 📰 Columns
  • About us

Introducing Amazon Nova Sonic: Human-like voice conversations for generative AI applications

  • Amazon
  • 7 May 2025
  • 5 minute read
Total
0
Shares
0
0
0

This article was written by a Hotel Marketing Flipboard. Click here to read the original article

Voiced by Polly

April 14, 2025: Post updated to clarify the context size.

Voice interfaces are essential to enhance customer experience in different areas such as customer support call automation, gaming, interactive education, and language learning. However, there are challenges when building voice-enabled applications.

Traditional approaches in building voice-enabled applications require complex orchestration of multiple models, such as speech recognition to convert speech to text, language models to understand and generate responses, and text-to-speech to convert text back to audio.

This fragmented approach not only increases development complexity but also fails to preserve crucial linguistic context such as tone, prosody, and speaking style that are essential for natural conversations. This can affect conversational AI applications that need low latency and nuanced understanding of verbal and non-verbal cues for fluid dialog handling and natural turn-taking.

To streamline the implementation of speech-enabled applications, today we are introducing Amazon Nova Sonic, the newest addition to the Amazon Nova family of foundation models (FMs) available in Amazon Bedrock.

Amazon Nova Sonic unifies speech understanding and generation into a single model that developers can use to create natural, human-like conversational AI experiences with low latency and industry-leading price performance. This integrated approach streamlines development and reduces complexity when building conversational applications.

Knowland Platform Launches User-Focused Navigation Infrastructure
Trending
Knowland Platform Launches User-Focused Navigation Infrastructure

Its unified model architecture delivers expressive speech generation and real-time text transcription without requiring a separate model. The result is an adaptive speech response that dynamically adjusts its delivery based on prosody, such as pace and timbre, of input speech.

When using Amazon Nova Sonic, developers have access to function calling (also known as tool use) and agentic workflows to interact with external services and APIs and perform tasks in the customer’s environment, including knowledge grounding with enterprise data using Retrieval-Augmented Generation (RAG).

At launch, Amazon Nova Sonic provides robust speech understanding for American and British English across various speaking styles and acoustic conditions, with additional languages coming soon.

Amazon Nova Sonic is developed with responsible AI at the forefront of innovation, featuring built-in protections for content moderation and watermarking.

Amazon Nova Sonic in action
The scenario for this demo is a contact center in the telecommunication industry. A customer reaches out to improve their subscription plan, and Amazon Nova Sonic handles the conversation.

With tool use, the model can interact with other systems and use agentic RAG with Amazon Bedrock Knowledge Bases to gather updated, customer-specific information such as account details, subscription plans, and pricing info.

The demo shows streaming transcription of speech input and displays streaming speech responses as text. The sentiment of the conversation is displayed in two ways: a time chart illustrating how it evolves, and a pie chart representing the overall distribution. There’s also an AI insights section providing contextual tips for a call center agent. Other interesting metrics shown in the web interface are the overall talk time distribution between the customer and the agent, and the average response time.

During the conversation with the support agent, you can observe through the metrics and hear in the voices how customer sentiment improves.

The video includes an example of how Amazon Nova Sonic handles interruptions smoothly, stopping to listen and then continuing the conversation in a natural way.

Now, let’s explore how you can integrate voice capabilities in your applications.

Using Amazon Nova Sonic
To get started with Amazon Nova Sonic, you first need to toggle model access in the Amazon Bedrock console, similar to how you would enable other FMs. Navigate to the Model access section of the navigation pane, find Amazon Nova Sonic under the Amazon models, and enable it for your account.

Amazon Bedrock provides a new bidirectional streaming API (InvokeModelWithBidirectionalStream) to help you implement real-time, low-latency conversational experiences on top of the HTTP/2 protocol. With this API, you can stream audio input to the model and receive audio output in real time, so that the conversation flows naturally.

You can use Amazon Nova Sonic with the new API with this model ID: amazon.nova-sonic-v1:0

After the session initialization, where you can configure inference parameters, the model operate through an event-driven architecture on both the input and output streams.

There are three key event types in the input stream:

System prompt – To set the overall system prompt for the conversation

Audio input streaming – To process continuous audio input in real-time

Tool result handling – To send the result of tool use calls back to the model (after tool use is requested in the output events)

Similarly, there are three groups of events in the output streams:

Automatic speech recognition (ASR) streaming – Speech-to-text transcript is generated, containing the result of realtime speech recognition.

Tool use handling – If there are a tool use events, they need to be handled using the information provided here, and the results sent back as input events.

Audio output streaming – To play output audio in real-time, a buffer is needed, because Amazon Nova Sonic model generates audio faster than real-time playback.

You can find examples of using Amazon Nova Sonic in the Amazon Nova model cookbook repository.

Prompt engineering for speech
When crafting prompts for Amazon Nova Sonic, your prompts should optimize content for auditory comprehension rather than visual reading, focusing on conversational flow and clarity when heard rather than seen.

When defining roles for your assistant, focus on conversational attributes (such as warm, patient, concise) rather than text-oriented attributes (detailed, comprehensive, systematic). A good baseline system prompt might be:

You are a friend. The user and you will engage in a spoken dialog exchanging the transcripts of a natural real-time conversation. Keep your responses short, generally two or three sentences for chatty scenarios.

More generally, when creating prompts for speech models, avoid requesting visual formatting (such as bullet points, tables, or code blocks), voice characteristic modifications (accent, age, or singing), or sound effects.

Things to know
Amazon Nova Sonic is available today in the US East (N. Virginia) AWS Region. Visit Amazon Bedrock pricing to see the pricing models.

Amazon Nova Sonic can understand speech in different speaking styles and generates speech in expressive voices, including both masculine-sounding and feminine-sounding voices, in different English accents, including American and British. Support for additional languages will be coming soon.

Amazon Nova Sonic handles user interruptions gracefully without dropping the conversational context and is robust to background noise. The model supports a 300K context window, with a default connection time limit of 8 minutes. However, you can extend your session by establishing a new connection and passing the previous chat history as context.

The following AWS SDKs support the new bidirectional streaming API:

Python developers can use this new experimental SDK that makes it easier to use the bidirectional streaming capabilities of Amazon Nova Sonic. We’re working to add support to the other AWS SDKs.

I’d like to thank Reilly Manton and Chad Hendren, who set up the demo with the contact center in the telecommunication industry, and Anuj Jauhari, who helped me understand the rich landscape in which speech-to-speech models are being deployed.

You can find more examples in Java, Node.js, and Python in the Amazon Nova model cookbook repo, including common integration patterns, such as RAG using Amazon Bedrock Knowledge Bases or LangChain.

To learn more, these articles that enter into the details of how to use the new bidirectional streaming API with compelling demos:

Whether you’re creating customer service solutions, language learning applications, or other conversational experiences, Amazon Nova Sonic provides the foundation for natural, engaging voice interactions. To get started, visit the Amazon Bedrock console today. To learn more, visit the Amazon Nova section of the user guide.

– Danilo


How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Please click here to access the full original article.

Total
0
Shares
Share 0
Tweet 0
Pin it 0
You should like too
View Post
  • Innovation

Travel Tech Essentialist #174: Surprise

  • Mauricio Prieto
  • 10 May 2025
View Post
  • Innovation

Chicago Voice AI Startup Debuts Omnichannel Auto-Attendant and Tap-to-Listen Audio Menus at the 2025 NRA Show

  • Automatic
  • 10 May 2025
View Post
  • Innovation

Carnival Cruise Line Partners with Cantaloupe, Inc. to Power Self-Service Experiences at Carnival’s Celebration Key

  • Automatic
  • 10 May 2025
View Post
  • Innovation

Expedia Launches Feature That Turns Reels on Instagram into Bookable Travel Itineraries

  • Automatic
  • 10 May 2025
View Post
  • Innovation

IHG Reports Q1 2025 Trading Update

  • LODGING Staff
  • 9 May 2025
View Post
  • Innovation

Expedia Launches Industry-First Feature That Turns Instagram Reels into Bookable Travel Itineraries

  • Automatic
  • 9 May 2025
View Post
  • Innovation

Tuu Eco Stay Awards Announces First Bamboo Verified Hotels: Regent Hong Kong, Pullman Phuket Karon and Zannier Bãi San Hô

  • Automatic
  • 9 May 2025
View Post
  • Innovation

Customer Spotlight: Holiday Inn Cleveland-Mayfield

  • Automatic
  • 9 May 2025
Sponsored Posts
  • The RFP Process for Hotel PMS

    View Post
  • Top hospitality tech trends from Mews Unfold 2024

    View Post
  • Getting Started with AI: A Step-by-Step Guide for Hoteliers

    View Post
Last Posts
  • Travel Tech Essentialist #174: Surprise
    • 10 May 2025
  • Why Most Hotels Fail at Social Media (And How to Get It Right) – Ben Wolff
    • 10 May 2025
  • Chicago Voice AI Startup Debuts Omnichannel Auto-Attendant and Tap-to-Listen Audio Menus at the 2025 NRA Show
    • 10 May 2025
  • Carnival Cruise Line Partners with Cantaloupe, Inc. to Power Self-Service Experiences at Carnival’s Celebration Key
    • 10 May 2025
  • Expedia Launches Feature That Turns Reels on Instagram into Bookable Travel Itineraries
    • 10 May 2025
Sponsors
  • The RFP Process for Hotel PMS
  • Top hospitality tech trends from Mews Unfold 2024
  • Getting Started with AI: A Step-by-Step Guide for Hoteliers
Contact informations

contact@10minutes.news

Advertise with us
Contact Marjolaine to learn more: marjolaine@wearepragmatik.com
Press release
pr@10minutes.news
10 Minutes News for Hoteliers 10 Minutes News for Hoteliers
  • Top News
  • Posts
  • 🎙️ Podcast
  • 👉 Sign-up
  • 🌎 Languages
  • 📰 Columns
  • About us
Discover the best of international hotel news. Categorized, and sign-up to the newsletter

Input your search keywords and press Enter.