Home icon

Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS

HPC Blog



This AWS HPC Blog article details a comprehensive guide for scaling Large Language Model (LLM) inference workloads using multi-node deployment with TensorRT-LLM and Triton on Amazon EKS, specifically demonstrating the deployment of the Llama 3.1 405B model.

  • Key technologies used:
    • Amazon EKS for Kubernetes cluster management
    • NVIDIA Triton Inference Server
    • NVIDIA TensorRT-LLM for model optimization
    • Elastic Fabric Adapter (EFA) for low-latency networking
    • Amazon EFS for shared storage
  • Deployment architecture highlights:
    • Uses 2 x P5.48xlarge instances with 8 H100 GPUs each
    • Implements tensor parallelism (8-way) and pipeline parallelism (2-way)
    • Utilizes LeaderWorkerSet for multi-node model deployment
    • Includes autoscaling with Horizontal Pod Autoscaler and Cluster Autoscaler
  • Key benefits:
  • Enables serving of massive LLMs across multiple nodes
  • Provides scalable and efficient inference infrastructure
  • Supports dynamic resource allocation and scaling

The article provides a detailed, step-by-step guide for setting up the infrastructure, configuring the deployment, and running inference on large language models.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Mar 18
2024
Optimize price-performance of LLM inference on NVIDIA GPUs using the Amazon SageMaker integration with NVIDIA NIM Microservices
Jan 9
2026
Accelerating LLM inference with post-training weight and activation using AWQ and GPTQ on Amazon SageMaker AI
Apr 22
2025
Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15
Apr 15
2026
Accelerating decode-heavy LLM inference with speculative decoding on AWS Trainium and vLLM

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.