Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers

Machine Learning Blog

This article provides a comprehensive guide to deploying large language models (LLMs) on Amazon EKS using vLLM Deep Learning Containers (DLCs), focusing on deploying the DeepSeek-R1-Distill-Qwen-32B model with high-performance infrastructure.

Solution leverages AWS services including EKS, P4d instances with NVIDIA A100 GPUs, Elastic Fabric Adapter (EFA), and FSx for Lustre
Uses AWS Deep Learning Containers for vLLM to simplify deployment and optimize performance
Demonstrates step-by-step process of: - Creating an EKS cluster - Setting up GPU-enabled node groups - Configuring FSx for Lustre storage - Installing necessary Kubernetes controllers - Deploying vLLM server using LeaderWorkerSet pattern
Provides API endpoints for text completions, chat completions, and embeddings
Highlights performance benefits of EFA, FSx for Lustre, and Application Load Balancer

The solution aims to help organizations deploy LLMs efficiently, optimize GPU resources, and create scalable, high-performance inference systems with minimal operational overhead.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Aug 14
2025

Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers

Nov 26
2024

Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips

May 9
2024

Deploy LLMs in AWS GovCloud (US) Regions using Hugging Face Inference Containers

Dec 2
2024

Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers

Related articles