Home icon

Deploying Generative AI Applications with NVIDIA NIM Microservices on Amazon Elastic Kubernetes Service (Amazon EKS) – Part 2

HPC Blog



This article is a step-by-step guide on deploying generative AI applications with NVIDIA NIM microservices on Amazon Elastic Kubernetes Service (Amazon EKS).

Specifically, the article covers:

  • Recap on NVIDIA NIM and the architecture diagram
  • Deploying a customized NIM with a custom values.yaml file
  • Monitoring and observability with Prometheus to scrape custom NIM metrics
  • Scaling options: Horizontal Pod Autoscaler (HPA) + Cluster Autoscaler (CAS) or Kubernetes Event Driven Autoscaler (KEDA) + Karpenter
  • Load balancing across NIM pods using an Application Load Balancer
  • Cleanup steps to delete the EKS cluster


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Jul 24
2024
Deploying generative AI applications with NVIDIA NIMs on Amazon EKS
Aug 29
2024
Accelerate Generative AI Inference with NVIDIA NIM Microservices on Amazon SageMaker
Jun 6
2024
Operationalize generative AI applications on AWS: Part II – Architecture Deep Dive
Mar 18
2026
Deploy production generative AI at the edge using Amazon EKS Hybrid Nodes with NVIDIA DGX

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.