Deploying Generative AI Applications with NVIDIA NIM Microservices on Amazon Elastic Kubernetes Service (Amazon EKS) – Part 2

HPC Blog

This article is a step-by-step guide on deploying generative AI applications with NVIDIA NIM microservices on Amazon Elastic Kubernetes Service (Amazon EKS).

Specifically, the article covers:

Recap on NVIDIA NIM and the architecture diagram
Deploying a customized NIM with a custom values.yaml file
Monitoring and observability with Prometheus to scrape custom NIM metrics
Scaling options: Horizontal Pod Autoscaler (HPA) + Cluster Autoscaler (CAS) or Kubernetes Event Driven Autoscaler (KEDA) + Karpenter
Load balancing across NIM pods using an Application Load Balancer
Cleanup steps to delete the EKS cluster

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Jul 24
2024

Deploying generative AI applications with NVIDIA NIMs on Amazon EKS

Aug 29
2024

Accelerate Generative AI Inference with NVIDIA NIM Microservices on Amazon SageMaker

Jun 6
2024

Operationalize generative AI applications on AWS: Part II – Architecture Deep Dive

Mar 18
2026

Deploy production generative AI at the edge using Amazon EKS Hybrid Nodes with NVIDIA DGX

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Deploying Generative AI Applications with NVIDIA NIM Microservices on Amazon Elastic Kubernetes Service (Amazon EKS) – Part 2

Related articles