AWS adds support for NIXL with EFA to accelerate LLM inference at scale
News
This article announces AWS support for NVIDIA Inference Xfer Library (NIXL) with Elastic Fabric Adapter (EFA) to accelerate disaggregated LLM inference on Amazon EC2.
- NIXL with EFA increases KV-cache throughput and reduces inter-token latency
- Enables efficient KV-cache movement between storage layers
- Compatible with all EFA-enabled EC2 instances across AWS regions
- Integrates natively with NVIDIA Dynamo, SGLang, and vLLM frameworks
- Available at no additional cost with NIXL 1.0.0+ and EFA installer 1.47.0+
NIXL with EFA provides flexible, performant disaggregated LLM inference at scale on EC2.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Mar 16
2026
2026
Introducing Disaggregated Inference on AWS powered by llm-d
Oct 24
2024
2024
AWS announces EFA update for scalability with AI/ML applications
Dec 2
2024
2024
Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS
Apr 22
2025
2025
Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.