AWS now supports NVIDIA Inference Xfer Library (NIXL) with Elastic Fabric Adapter (EFA) to accelerate disaggregated LLM inference on EC2, improving KV-cache throughput, latency, and memory utilization across compatible instances and frameworks.


<div><p>This article announces AWS support for NVIDIA Inference Xfer Library (NIXL) with Elastic Fabric Adapter (EFA) to accelerate disaggregated LLM inference on Amazon EC2.</p><ul><li>NIXL with EFA increases KV-cache throughput and reduces inter-token latency</li><li>Enables efficient KV-cache movement between storage layers</li><li>Compatible with all EFA-enabled EC2 instances across AWS regions</li><li>Integrates natively with NVIDIA Dynamo, SGLang, and vLLM frameworks</li><li>Available at no additional cost with NIXL 1.0.0+ and EFA installer 1.47.0+</li></ul><p>NIXL with EFA provides flexible, performant disaggregated LLM inference at scale on EC2.</p></div>


AWS adds support for NIXL with EFA to accelerate LLM inference at scale

Related articles

Related articles

Mar 16
2026
Introducing Disaggregated Inference on AWS powered by llm-d

Oct 24
2024
AWS announces EFA update for scalability with AI/ML applications

Dec 2
2024
Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS

Apr 22
2025
Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15