Introducing Disaggregated Inference on AWS powered by llm-d

Machine Learning Blog

This article announces AWS's collaboration with the llm-d team to bring disaggregated inference capabilities to AWS, enabling optimized large language model serving at scale.

llm-d separates LLM inference prefill and decode phases across distributed GPU resources for better optimization
Intelligent scheduling routes requests based on KV cache locality without requiring full cache state visibility
Prefill-decode disaggregation allows independent scaling of compute-intensive and memory-intensive phases
Wide expert parallelism optimizes Mixture-of-Experts models like DeepSeek-R1 and Qwen3.5
Tiered prefix caching offloads KV cache entries to CPU memory or disk beyond GPU limits
Integration with AWS Elastic Fabric Adapter (EFA) and NIXL enables high-performance point-to-point transfers
Benchmarks show up to 70% throughput improvement with prefill-decode disaggregation versus standard vLLM
Deployable on Amazon SageMaker HyperPod and Amazon EKS using Kubernetes-native architecture

llm-d provides production-grade orchestration and scheduling for distributed LLM inference, significantly improving performance and resource utilization for large-scale AI workloads on AWS.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Mar 19
2026

AWS adds support for NIXL with EFA to accelerate LLM inference at scale

Feb 24
2026

Announcing AWS Elemental Inference

Apr 15
2026

Accelerating decode-heavy LLM inference with speculative decoding on AWS Trainium and vLLM

Jun 15
2026

How Public AI delivers sovereign LLM inference on AWS and Intel

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Introducing Disaggregated Inference on AWS powered by llm-d

Related articles