Home icon

Introducing bidirectional streaming for real-time inference on Amazon SageMaker AI

Machine Learning Blog



This article introduces bidirectional streaming for Amazon SageMaker AI Inference, enabling real-time two-way communication between clients and AI models for applications like speech-to-text and voice agents.

  • Bidirectional streaming allows simultaneous data flow in both directions over persistent HTTP/2 and WebSocket connections
  • Reduces latency and infrastructure overhead compared to traditional request-response inference patterns
  • Supports bring-your-own-container deployments with WebSocket protocol implementation at localhost:8080
  • SageMaker infrastructure includes health monitoring with ping/pong frames every 60 seconds
  • Deepgram Nova-3 speech-to-text model available on AWS Marketplace with 14-day free trial
  • Includes code examples for building custom containers and invoking endpoints with bidirectional streaming
  • Enables real-time transcription, multi-turn conversations, and continuous AI agent interactions

SageMaker AI bidirectional streaming simplifies deployment of real-time voice AI applications with reduced operational complexity and improved user experience.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Nov 25
2025
Amazon SageMaker AI Inference now supports bidirectional streaming
Jan 9
2024
Inference Llama 2 models with real-time response streaming using Amazon SageMaker
Dec 6
2024
Amazon SageMaker introduces new capabilities to accelerate scaling of Generative AI Inference
Apr 21
2022
Amazon SageMaker Serverless Inference is now generally available

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.