Home icon

Serverless strategies for streaming LLM responses

Compute Blog



This article compares three serverless approaches for streaming Amazon Bedrock LLM responses in real-time on AWS.

  • Lambda function URLs with response streaming: Direct HTTP endpoint, lowest complexity, Node.js only
  • API Gateway WebSocket APIs: Persistent connections, ideal for multi-turn conversations, requires custom auth
  • AWS AppSync GraphQL subscriptions: Fully managed, built-in Cognito support, best for complex applications
  • Lambda URLs offer lowest latency and cost for single-user applications but lack native Cognito auth
  • WebSocket APIs enable bidirectional communication and connection reuse but require more development
  • AppSync provides seamless Cognito integration and automatic subscription management with GraphQL
  • All three approaches are serverless, production-ready, and support Amazon Cognito authentication
  • Choose based on application complexity, real-time requirements, and existing architecture

The article provides implementation details, code examples, and a comparison table to help select the best streaming strategy for your LLM application.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Feb 2
2026
Optimize LLM response costs and latency with effective caching
Oct 4
2024
Designing Serverless Integration Patterns for Large Language Models (LLMs)
Feb 1
2024
Preprocess and fine-tune LLMs quickly and cost-effectively using Amazon EMR Serverless and Amazon SageMaker
Jul 24
2024
LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.