Deploying Generative AI Applications with NVIDIA NIM Microservices on Amazon Elastic Kubernetes Service (Amazon EKS) – Part 2
HPC Blog
This article is a step-by-step guide on deploying generative AI applications with NVIDIA NIM microservices on Amazon Elastic Kubernetes Service (Amazon EKS).
Specifically, the article covers:
- Recap on NVIDIA NIM and the architecture diagram
- Deploying a customized NIM with a custom values.yaml file
- Monitoring and observability with Prometheus to scrape custom NIM metrics
- Scaling options: Horizontal Pod Autoscaler (HPA) + Cluster Autoscaler (CAS) or Kubernetes Event Driven Autoscaler (KEDA) + Karpenter
- Load balancing across NIM pods using an Application Load Balancer
- Cleanup steps to delete the EKS cluster
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Jul 24
2024
2024
Deploying generative AI applications with NVIDIA NIMs on Amazon EKS
Aug 29
2024
2024
Accelerate Generative AI Inference with NVIDIA NIM Microservices on Amazon SageMaker
Jun 6
2024
2024
Operationalize generative AI applications on AWS: Part II – Architecture Deep Dive
Mar 18
2026
2026
Deploy production generative AI at the edge using Amazon EKS Hybrid Nodes with NVIDIA DGX
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.