Home icon

Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers

Machine Learning Blog



This article provides a comprehensive guide to deploying large language models (LLMs) on Amazon EKS using vLLM Deep Learning Containers (DLCs), focusing on deploying the DeepSeek-R1-Distill-Qwen-32B model with high-performance infrastructure.

  • Solution leverages AWS services including EKS, P4d instances with NVIDIA A100 GPUs, Elastic Fabric Adapter (EFA), and FSx for Lustre
  • Uses AWS Deep Learning Containers for vLLM to simplify deployment and optimize performance
  • Demonstrates step-by-step process of: - Creating an EKS cluster - Setting up GPU-enabled node groups - Configuring FSx for Lustre storage - Installing necessary Kubernetes controllers - Deploying vLLM server using LeaderWorkerSet pattern
  • Provides API endpoints for text completions, chat completions, and embeddings
  • Highlights performance benefits of EFA, FSx for Lustre, and Application Load Balancer

The solution aims to help organizations deploy LLMs efficiently, optimize GPU resources, and create scalable, high-performance inference systems with minimal operational overhead.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Aug 14
2025
Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers
Nov 26
2024
Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips
May 9
2024
Deploy LLMs in AWS GovCloud (US) Regions using Hugging Face Inference Containers
Dec 2
2024
Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.