Accelerate generative AI inference with NVIDIA Dynamo and Amazon EKS

Machine Learning Blog

This article discusses NVIDIA Dynamo, an open-source inference framework designed to optimize performance and scalability for large language models (LLMs) and generative AI applications on Amazon EKS.

Supports distributed, multi-node inference with low latency
Disaggregates prefill and decode phases across different GPUs
Dynamically manages GPU resources using the Dynamo Planner
Implements Smart Router to minimize KV cache recomputation
Uses tiered offloading for cost-effective KV cache management
Provides accelerated data transfer with NVIDIA NIXL library

The solution demonstrates how to deploy NVIDIA Dynamo on Amazon EKS, leveraging features like Karpenter auto-scaling, EFA networking, and AWS service integrations for production-ready generative AI inference.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Sep 4
2025

How to run AI model inference with GPUs on Amazon EKS Auto Mode

Jul 16
2024

Accelerate your generative AI distributed training workloads with the NVIDIA NeMo Framework on Amazon EKS

Aug 29
2024

Accelerate Generative AI Inference with NVIDIA NIM Microservices on Amazon SageMaker

Jul 24
2024

Deploying generative AI applications with NVIDIA NIMs on Amazon EKS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Accelerate generative AI inference with NVIDIA Dynamo and Amazon EKS

Related articles