Introducing bidirectional streaming for real-time inference on Amazon SageMaker AI
Machine Learning Blog
This article introduces bidirectional streaming for Amazon SageMaker AI Inference, enabling real-time two-way communication between clients and AI models for applications like speech-to-text and voice agents.
- Bidirectional streaming allows simultaneous data flow in both directions over persistent HTTP/2 and WebSocket connections
- Reduces latency and infrastructure overhead compared to traditional request-response inference patterns
- Supports bring-your-own-container deployments with WebSocket protocol implementation at localhost:8080
- SageMaker infrastructure includes health monitoring with ping/pong frames every 60 seconds
- Deepgram Nova-3 speech-to-text model available on AWS Marketplace with 14-day free trial
- Includes code examples for building custom containers and invoking endpoints with bidirectional streaming
- Enables real-time transcription, multi-turn conversations, and continuous AI agent interactions
SageMaker AI bidirectional streaming simplifies deployment of real-time voice AI applications with reduced operational complexity and improved user experience.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
2025
2024
2024
2022
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.